All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] migrate_pages(): batch TLB flushing
@ 2022-12-27  0:28 Huang Ying
  2022-12-27  0:28 ` [PATCH 1/8] migrate_pages: organize stats with struct migrate_pages_stats Huang Ying
                   ` (7 more replies)
  0 siblings, 8 replies; 39+ messages in thread
From: Huang Ying @ 2022-12-27  0:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang, Ying, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	Alistair Popple, haoxin

From: "Huang, Ying" <ying.huang@intel.com>

Now, migrate_pages() migrate folios one by one, like the fake code as
follows,

  for each folio
    unmap
    flush TLB
    copy
    restore map

If multiple folios are passed to migrate_pages(), there are
opportunities to batch the TLB flushing and copying.  That is, we can
change the code to something as follows,

  for each folio
    unmap
  for each folio
    flush TLB
  for each folio
    copy
  for each folio
    restore map

The total number of TLB flushing IPI can be reduced considerably.  And
we may use some hardware accelerator such as DSA to accelerate the
folio copying.

So in this patch, we refactor the migrate_pages() implementation and
implement the TLB flushing batching.  Base on this, hardware
accelerated folio copying can be implemented.

If too many folios are passed to migrate_pages(), in the naive batched
implementation, we may unmap too many folios at the same time.  The
possibility for a task to wait for the migrated folios to be mapped
again increases.  So the latency may be hurt.  To deal with this
issue, the max number of folios be unmapped in batch is restricted to
no more than HPAGE_PMD_NR in the unit of page.  That is, the influence
is at the same level of THP migration.

We use the following test to measure the performance impact of the
patchset,

On a 2-socket Intel server,

 - Run pmbench memory accessing benchmark

 - Run `migratepages` to migrate pages of pmbench between node 0 and
   node 1 back and forth.

With the patch, the TLB flushing IPI reduces 99.1% during the test and
the number of pages migrated successfully per second increases 291.7%.

This patchset is based on mm-unstable.

Changes:

from rfc to v1:

- Rebased on v6.2-rc1

- Fix the deadlock issue caused by locking multiple pages synchronously
  per Alistair's comments.  Thanks!

- Fix the autonumabench panic per Rao's comments and fix.  Thanks!

- Other minor fixes per comments. Thanks!

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/8] migrate_pages: organize stats with struct migrate_pages_stats
  2022-12-27  0:28 [PATCH 0/8] migrate_pages(): batch TLB flushing Huang Ying
@ 2022-12-27  0:28 ` Huang Ying
  2023-01-03 18:06   ` Zi Yan
  2023-01-05  3:02   ` Alistair Popple
  2022-12-27  0:28 ` [PATCH 2/8] migrate_pages: separate hugetlb folios migration Huang Ying
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 39+ messages in thread
From: Huang Ying @ 2022-12-27  0:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	Alistair Popple, haoxin

Define struct migrate_pages_stats to organize the various statistics
in migrate_pages().  This makes it easier to collect and consume the
statistics in multiple functions.  This will be needed in the
following patches in the series.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: haoxin <xhao@linux.alibaba.com>
---
 mm/migrate.c | 58 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 32 insertions(+), 26 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index a4d3fc65085f..ec9263a33d38 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1396,6 +1396,14 @@ static inline int try_split_folio(struct folio *folio, struct list_head *split_f
 	return rc;
 }
 
+struct migrate_pages_stats {
+	int nr_succeeded;
+	int nr_failed_pages;
+	int nr_thp_succeeded;
+	int nr_thp_failed;
+	int nr_thp_split;
+};
+
 /*
  * migrate_pages - migrate the folios specified in a list, to the free folios
  *		   supplied as the target for the page migration
@@ -1430,13 +1438,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 	int large_retry = 1;
 	int thp_retry = 1;
 	int nr_failed = 0;
-	int nr_failed_pages = 0;
 	int nr_retry_pages = 0;
-	int nr_succeeded = 0;
-	int nr_thp_succeeded = 0;
 	int nr_large_failed = 0;
-	int nr_thp_failed = 0;
-	int nr_thp_split = 0;
 	int pass = 0;
 	bool is_large = false;
 	bool is_thp = false;
@@ -1446,9 +1449,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 	LIST_HEAD(split_folios);
 	bool nosplit = (reason == MR_NUMA_MISPLACED);
 	bool no_split_folio_counting = false;
+	struct migrate_pages_stats stats;
 
 	trace_mm_migrate_pages_start(mode, reason);
 
+	memset(&stats, 0, sizeof(stats));
 split_folio_migration:
 	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
 		retry = 0;
@@ -1502,9 +1507,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 				/* Large folio migration is unsupported */
 				if (is_large) {
 					nr_large_failed++;
-					nr_thp_failed += is_thp;
+					stats.nr_thp_failed += is_thp;
 					if (!try_split_folio(folio, &split_folios)) {
-						nr_thp_split += is_thp;
+						stats.nr_thp_split += is_thp;
 						break;
 					}
 				/* Hugetlb migration is unsupported */
@@ -1512,7 +1517,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 					nr_failed++;
 				}
 
-				nr_failed_pages += nr_pages;
+				stats.nr_failed_pages += nr_pages;
 				list_move_tail(&folio->lru, &ret_folios);
 				break;
 			case -ENOMEM:
@@ -1522,13 +1527,13 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 				 */
 				if (is_large) {
 					nr_large_failed++;
-					nr_thp_failed += is_thp;
+					stats.nr_thp_failed += is_thp;
 					/* Large folio NUMA faulting doesn't split to retry. */
 					if (!nosplit) {
 						int ret = try_split_folio(folio, &split_folios);
 
 						if (!ret) {
-							nr_thp_split += is_thp;
+							stats.nr_thp_split += is_thp;
 							break;
 						} else if (reason == MR_LONGTERM_PIN &&
 							   ret == -EAGAIN) {
@@ -1546,7 +1551,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 					nr_failed++;
 				}
 
-				nr_failed_pages += nr_pages + nr_retry_pages;
+				stats.nr_failed_pages += nr_pages + nr_retry_pages;
 				/*
 				 * There might be some split folios of fail-to-migrate large
 				 * folios left in split_folios list. Move them back to migration
@@ -1556,7 +1561,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 				list_splice_init(&split_folios, from);
 				/* nr_failed isn't updated for not used */
 				nr_large_failed += large_retry;
-				nr_thp_failed += thp_retry;
+				stats.nr_thp_failed += thp_retry;
 				goto out;
 			case -EAGAIN:
 				if (is_large) {
@@ -1568,8 +1573,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 				nr_retry_pages += nr_pages;
 				break;
 			case MIGRATEPAGE_SUCCESS:
-				nr_succeeded += nr_pages;
-				nr_thp_succeeded += is_thp;
+				stats.nr_succeeded += nr_pages;
+				stats.nr_thp_succeeded += is_thp;
 				break;
 			default:
 				/*
@@ -1580,20 +1585,20 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 				 */
 				if (is_large) {
 					nr_large_failed++;
-					nr_thp_failed += is_thp;
+					stats.nr_thp_failed += is_thp;
 				} else if (!no_split_folio_counting) {
 					nr_failed++;
 				}
 
-				nr_failed_pages += nr_pages;
+				stats.nr_failed_pages += nr_pages;
 				break;
 			}
 		}
 	}
 	nr_failed += retry;
 	nr_large_failed += large_retry;
-	nr_thp_failed += thp_retry;
-	nr_failed_pages += nr_retry_pages;
+	stats.nr_thp_failed += thp_retry;
+	stats.nr_failed_pages += nr_retry_pages;
 	/*
 	 * Try to migrate split folios of fail-to-migrate large folios, no
 	 * nr_failed counting in this round, since all split folios of a
@@ -1626,16 +1631,17 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 	if (list_empty(from))
 		rc = 0;
 
-	count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
-	count_vm_events(PGMIGRATE_FAIL, nr_failed_pages);
-	count_vm_events(THP_MIGRATION_SUCCESS, nr_thp_succeeded);
-	count_vm_events(THP_MIGRATION_FAIL, nr_thp_failed);
-	count_vm_events(THP_MIGRATION_SPLIT, nr_thp_split);
-	trace_mm_migrate_pages(nr_succeeded, nr_failed_pages, nr_thp_succeeded,
-			       nr_thp_failed, nr_thp_split, mode, reason);
+	count_vm_events(PGMIGRATE_SUCCESS, stats.nr_succeeded);
+	count_vm_events(PGMIGRATE_FAIL, stats.nr_failed_pages);
+	count_vm_events(THP_MIGRATION_SUCCESS, stats.nr_thp_succeeded);
+	count_vm_events(THP_MIGRATION_FAIL, stats.nr_thp_failed);
+	count_vm_events(THP_MIGRATION_SPLIT, stats.nr_thp_split);
+	trace_mm_migrate_pages(stats.nr_succeeded, stats.nr_failed_pages,
+			       stats.nr_thp_succeeded, stats.nr_thp_failed,
+			       stats.nr_thp_split, mode, reason);
 
 	if (ret_succeeded)
-		*ret_succeeded = nr_succeeded;
+		*ret_succeeded = stats.nr_succeeded;
 
 	return rc;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/8] migrate_pages: separate hugetlb folios migration
  2022-12-27  0:28 [PATCH 0/8] migrate_pages(): batch TLB flushing Huang Ying
  2022-12-27  0:28 ` [PATCH 1/8] migrate_pages: organize stats with struct migrate_pages_stats Huang Ying
@ 2022-12-27  0:28 ` Huang Ying
  2022-12-28 23:17   ` Andrew Morton
  2023-01-05  4:13   ` Alistair Popple
  2022-12-27  0:28 ` [PATCH 3/8] migrate_pages: restrict number of pages to migrate in batch Huang Ying
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 39+ messages in thread
From: Huang Ying @ 2022-12-27  0:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	Alistair Popple, haoxin

This is a preparation patch to batch the folio unmapping and moving
for the non-hugetlb folios.  Based on that we can batch the TLB
shootdown during the folio migration and make it possible to use some
hardware accelerator for the folio copying.

In this patch the hugetlb folios and non-hugetlb folios migration is
separated in migrate_pages() to make it easy to change the non-hugetlb
folios migration implementation.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: haoxin <xhao@linux.alibaba.com>
---
 mm/migrate.c | 114 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 99 insertions(+), 15 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index ec9263a33d38..bdbe73fe2eb7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1404,6 +1404,87 @@ struct migrate_pages_stats {
 	int nr_thp_split;
 };
 
+static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
+			    free_page_t put_new_page, unsigned long private,
+			    enum migrate_mode mode, int reason,
+			    struct migrate_pages_stats *stats,
+			    struct list_head *ret_folios)
+{
+	int retry = 1;
+	int nr_failed = 0;
+	int nr_retry_pages = 0;
+	int pass = 0;
+	struct folio *folio, *folio2;
+	int rc = 0, nr_pages;
+
+	for (pass = 0; pass < 10 && retry; pass++) {
+		retry = 0;
+		nr_retry_pages = 0;
+
+		list_for_each_entry_safe(folio, folio2, from, lru) {
+			if (!folio_test_hugetlb(folio))
+				continue;
+
+			nr_pages = folio_nr_pages(folio);
+
+			cond_resched();
+
+			rc = unmap_and_move_huge_page(get_new_page,
+						      put_new_page, private,
+						      &folio->page, pass > 2, mode,
+						      reason, ret_folios);
+			/*
+			 * The rules are:
+			 *	Success: hugetlb folio will be put back
+			 *	-EAGAIN: stay on the from list
+			 *	-ENOMEM: stay on the from list
+			 *	-ENOSYS: stay on the from list
+			 *	Other errno: put on ret_folios list
+			 */
+			switch(rc) {
+			case -ENOSYS:
+				/* Hugetlb migration is unsupported */
+				nr_failed++;
+				stats->nr_failed_pages += nr_pages;
+				list_move_tail(&folio->lru, ret_folios);
+				break;
+			case -ENOMEM:
+				/*
+				 * When memory is low, don't bother to try to migrate
+				 * other folios, just exit.
+				 */
+				nr_failed++;
+				stats->nr_failed_pages += nr_pages;
+				goto out;
+			case -EAGAIN:
+				retry++;
+				nr_retry_pages += nr_pages;
+				break;
+			case MIGRATEPAGE_SUCCESS:
+				stats->nr_succeeded += nr_pages;
+				break;
+			default:
+				/*
+				 * Permanent failure (-EBUSY, etc.):
+				 * unlike -EAGAIN case, the failed folio is
+				 * removed from migration folio list and not
+				 * retried in the next outer loop.
+				 */
+				nr_failed++;
+				stats->nr_failed_pages += nr_pages;
+				break;
+			}
+		}
+	}
+out:
+	nr_failed += retry;
+	stats->nr_failed_pages += nr_retry_pages;
+	if (rc != -ENOMEM)
+		rc = nr_failed;
+
+	return rc;
+}
+
 /*
  * migrate_pages - migrate the folios specified in a list, to the free folios
  *		   supplied as the target for the page migration
@@ -1437,7 +1518,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 	int retry = 1;
 	int large_retry = 1;
 	int thp_retry = 1;
-	int nr_failed = 0;
+	int nr_failed;
 	int nr_retry_pages = 0;
 	int nr_large_failed = 0;
 	int pass = 0;
@@ -1454,6 +1535,12 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 	trace_mm_migrate_pages_start(mode, reason);
 
 	memset(&stats, 0, sizeof(stats));
+	rc = migrate_hugetlbs(from, get_new_page, put_new_page, private, mode, reason,
+			      &stats, &ret_folios);
+	if (rc < 0)
+		goto out;
+	nr_failed = rc;
+
 split_folio_migration:
 	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
 		retry = 0;
@@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 		nr_retry_pages = 0;
 
 		list_for_each_entry_safe(folio, folio2, from, lru) {
+			if (folio_test_hugetlb(folio)) {
+				list_move_tail(&folio->lru, &ret_folios);
+				continue;
+			}
+
 			/*
 			 * Large folio statistics is based on the source large
 			 * folio. Capture required information that might get
 			 * lost during migration.
 			 */
-			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
+			is_large = folio_test_large(folio);
 			is_thp = is_large && folio_test_pmd_mappable(folio);
 			nr_pages = folio_nr_pages(folio);
+
 			cond_resched();
 
-			if (folio_test_hugetlb(folio))
-				rc = unmap_and_move_huge_page(get_new_page,
-						put_new_page, private,
-						&folio->page, pass > 2, mode,
-						reason,
-						&ret_folios);
-			else
-				rc = unmap_and_move(get_new_page, put_new_page,
-						private, folio, pass > 2, mode,
-						reason, &ret_folios);
+			rc = unmap_and_move(get_new_page, put_new_page,
+					    private, folio, pass > 2, mode,
+					    reason, &ret_folios);
 			/*
 			 * The rules are:
-			 *	Success: non hugetlb folio will be freed, hugetlb
-			 *		 folio will be put back
+			 *	Success: folio will be freed
 			 *	-EAGAIN: stay on the from list
 			 *	-ENOMEM: stay on the from list
 			 *	-ENOSYS: stay on the from list
@@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 						stats.nr_thp_split += is_thp;
 						break;
 					}
-				/* Hugetlb migration is unsupported */
 				} else if (!no_split_folio_counting) {
 					nr_failed++;
 				}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/8] migrate_pages: restrict number of pages to migrate in batch
  2022-12-27  0:28 [PATCH 0/8] migrate_pages(): batch TLB flushing Huang Ying
  2022-12-27  0:28 ` [PATCH 1/8] migrate_pages: organize stats with struct migrate_pages_stats Huang Ying
  2022-12-27  0:28 ` [PATCH 2/8] migrate_pages: separate hugetlb folios migration Huang Ying
@ 2022-12-27  0:28 ` Huang Ying
  2023-01-03 18:40   ` Zi Yan
  2022-12-27  0:28 ` [PATCH 4/8] migrate_pages: split unmap_and_move() to _unmap() and _move() Huang Ying
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Huang Ying @ 2022-12-27  0:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	Alistair Popple, haoxin

This is a preparation patch to batch the folio unmapping and moving
for non-hugetlb folios.

If we had batched the folio unmapping, all folios to be migrated would
be unmapped before copying the contents and flags of the folios.  If
the folios that were passed to migrate_pages() were too many in unit
of pages, the execution of the processes would be stopped for too long
time, thus too long latency.  For example, migrate_pages() syscall
will call migrate_pages() with all folios of a process.  To avoid this
possible issue, in this patch, we restrict the number of pages to be
migrated to be no more than HPAGE_PMD_NR.  That is, the influence is
at the same level of THP migration.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: haoxin <xhao@linux.alibaba.com>
---
 mm/migrate.c | 173 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 106 insertions(+), 67 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index bdbe73fe2eb7..97ea0737ab2b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1485,40 +1485,15 @@ static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
 	return rc;
 }
 
-/*
- * migrate_pages - migrate the folios specified in a list, to the free folios
- *		   supplied as the target for the page migration
- *
- * @from:		The list of folios to be migrated.
- * @get_new_page:	The function used to allocate free folios to be used
- *			as the target of the folio migration.
- * @put_new_page:	The function used to free target folios if migration
- *			fails, or NULL if no special handling is necessary.
- * @private:		Private data to be passed on to get_new_page()
- * @mode:		The migration mode that specifies the constraints for
- *			folio migration, if any.
- * @reason:		The reason for folio migration.
- * @ret_succeeded:	Set to the number of folios migrated successfully if
- *			the caller passes a non-NULL pointer.
- *
- * The function returns after 10 attempts or if no folios are movable any more
- * because the list has become empty or no retryable folios exist any more.
- * It is caller's responsibility to call putback_movable_pages() to return folios
- * to the LRU or free list only if ret != 0.
- *
- * Returns the number of {normal folio, large folio, hugetlb} that were not
- * migrated, or an error code. The number of large folio splits will be
- * considered as the number of non-migrated large folio, no matter how many
- * split folios of the large folio are migrated successfully.
- */
-int migrate_pages(struct list_head *from, new_page_t get_new_page,
+static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
 		free_page_t put_new_page, unsigned long private,
-		enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
+		enum migrate_mode mode, int reason, struct list_head *ret_folios,
+		struct migrate_pages_stats *stats)
 {
 	int retry = 1;
 	int large_retry = 1;
 	int thp_retry = 1;
-	int nr_failed;
+	int nr_failed = 0;
 	int nr_retry_pages = 0;
 	int nr_large_failed = 0;
 	int pass = 0;
@@ -1526,20 +1501,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 	bool is_thp = false;
 	struct folio *folio, *folio2;
 	int rc, nr_pages;
-	LIST_HEAD(ret_folios);
 	LIST_HEAD(split_folios);
 	bool nosplit = (reason == MR_NUMA_MISPLACED);
 	bool no_split_folio_counting = false;
-	struct migrate_pages_stats stats;
-
-	trace_mm_migrate_pages_start(mode, reason);
-
-	memset(&stats, 0, sizeof(stats));
-	rc = migrate_hugetlbs(from, get_new_page, put_new_page, private, mode, reason,
-			      &stats, &ret_folios);
-	if (rc < 0)
-		goto out;
-	nr_failed = rc;
 
 split_folio_migration:
 	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
@@ -1549,11 +1513,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 		nr_retry_pages = 0;
 
 		list_for_each_entry_safe(folio, folio2, from, lru) {
-			if (folio_test_hugetlb(folio)) {
-				list_move_tail(&folio->lru, &ret_folios);
-				continue;
-			}
-
 			/*
 			 * Large folio statistics is based on the source large
 			 * folio. Capture required information that might get
@@ -1567,15 +1526,14 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 
 			rc = unmap_and_move(get_new_page, put_new_page,
 					    private, folio, pass > 2, mode,
-					    reason, &ret_folios);
+					    reason, ret_folios);
 			/*
 			 * The rules are:
 			 *	Success: folio will be freed
 			 *	-EAGAIN: stay on the from list
 			 *	-ENOMEM: stay on the from list
 			 *	-ENOSYS: stay on the from list
-			 *	Other errno: put on ret_folios list then splice to
-			 *		     from list
+			 *	Other errno: put on ret_folios list
 			 */
 			switch(rc) {
 			/*
@@ -1592,17 +1550,17 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 				/* Large folio migration is unsupported */
 				if (is_large) {
 					nr_large_failed++;
-					stats.nr_thp_failed += is_thp;
+					stats->nr_thp_failed += is_thp;
 					if (!try_split_folio(folio, &split_folios)) {
-						stats.nr_thp_split += is_thp;
+						stats->nr_thp_split += is_thp;
 						break;
 					}
 				} else if (!no_split_folio_counting) {
 					nr_failed++;
 				}
 
-				stats.nr_failed_pages += nr_pages;
-				list_move_tail(&folio->lru, &ret_folios);
+				stats->nr_failed_pages += nr_pages;
+				list_move_tail(&folio->lru, ret_folios);
 				break;
 			case -ENOMEM:
 				/*
@@ -1611,13 +1569,13 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 				 */
 				if (is_large) {
 					nr_large_failed++;
-					stats.nr_thp_failed += is_thp;
+					stats->nr_thp_failed += is_thp;
 					/* Large folio NUMA faulting doesn't split to retry. */
 					if (!nosplit) {
 						int ret = try_split_folio(folio, &split_folios);
 
 						if (!ret) {
-							stats.nr_thp_split += is_thp;
+							stats->nr_thp_split += is_thp;
 							break;
 						} else if (reason == MR_LONGTERM_PIN &&
 							   ret == -EAGAIN) {
@@ -1635,17 +1593,17 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 					nr_failed++;
 				}
 
-				stats.nr_failed_pages += nr_pages + nr_retry_pages;
+				stats->nr_failed_pages += nr_pages + nr_retry_pages;
 				/*
 				 * There might be some split folios of fail-to-migrate large
-				 * folios left in split_folios list. Move them back to migration
+				 * folios left in split_folios list. Move them to ret_folios
 				 * list so that they could be put back to the right list by
 				 * the caller otherwise the folio refcnt will be leaked.
 				 */
-				list_splice_init(&split_folios, from);
+				list_splice_init(&split_folios, ret_folios);
 				/* nr_failed isn't updated for not used */
 				nr_large_failed += large_retry;
-				stats.nr_thp_failed += thp_retry;
+				stats->nr_thp_failed += thp_retry;
 				goto out;
 			case -EAGAIN:
 				if (is_large) {
@@ -1657,8 +1615,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 				nr_retry_pages += nr_pages;
 				break;
 			case MIGRATEPAGE_SUCCESS:
-				stats.nr_succeeded += nr_pages;
-				stats.nr_thp_succeeded += is_thp;
+				stats->nr_succeeded += nr_pages;
+				stats->nr_thp_succeeded += is_thp;
 				break;
 			default:
 				/*
@@ -1669,20 +1627,20 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 				 */
 				if (is_large) {
 					nr_large_failed++;
-					stats.nr_thp_failed += is_thp;
+					stats->nr_thp_failed += is_thp;
 				} else if (!no_split_folio_counting) {
 					nr_failed++;
 				}
 
-				stats.nr_failed_pages += nr_pages;
+				stats->nr_failed_pages += nr_pages;
 				break;
 			}
 		}
 	}
 	nr_failed += retry;
 	nr_large_failed += large_retry;
-	stats.nr_thp_failed += thp_retry;
-	stats.nr_failed_pages += nr_retry_pages;
+	stats->nr_thp_failed += thp_retry;
+	stats->nr_failed_pages += nr_retry_pages;
 	/*
 	 * Try to migrate split folios of fail-to-migrate large folios, no
 	 * nr_failed counting in this round, since all split folios of a
@@ -1693,7 +1651,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 		 * Move non-migrated folios (after 10 retries) to ret_folios
 		 * to avoid migrating them again.
 		 */
-		list_splice_init(from, &ret_folios);
+		list_splice_init(from, ret_folios);
 		list_splice_init(&split_folios, from);
 		no_split_folio_counting = true;
 		retry = 1;
@@ -1701,6 +1659,87 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 	}
 
 	rc = nr_failed + nr_large_failed;
+out:
+	return rc;
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define NR_MAX_BATCHED_MIGRATION	HPAGE_PMD_NR
+#else
+#define NR_MAX_BATCHED_MIGRATION	512
+#endif
+
+/*
+ * migrate_pages - migrate the folios specified in a list, to the free folios
+ *		   supplied as the target for the page migration
+ *
+ * @from:		The list of folios to be migrated.
+ * @get_new_page:	The function used to allocate free folios to be used
+ *			as the target of the folio migration.
+ * @put_new_page:	The function used to free target folios if migration
+ *			fails, or NULL if no special handling is necessary.
+ * @private:		Private data to be passed on to get_new_page()
+ * @mode:		The migration mode that specifies the constraints for
+ *			folio migration, if any.
+ * @reason:		The reason for folio migration.
+ * @ret_succeeded:	Set to the number of folios migrated successfully if
+ *			the caller passes a non-NULL pointer.
+ *
+ * The function returns after 10 attempts or if no folios are movable any more
+ * because the list has become empty or no retryable folios exist any more.
+ * It is caller's responsibility to call putback_movable_pages() to return folios
+ * to the LRU or free list only if ret != 0.
+ *
+ * Returns the number of {normal folio, large folio, hugetlb} that were not
+ * migrated, or an error code. The number of large folio splits will be
+ * considered as the number of non-migrated large folio, no matter how many
+ * split folios of the large folio are migrated successfully.
+ */
+int migrate_pages(struct list_head *from, new_page_t get_new_page,
+		free_page_t put_new_page, unsigned long private,
+		enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
+{
+	int rc, rc_gether;
+	int nr_pages;
+	struct folio *folio, *folio2;
+	LIST_HEAD(folios);
+	LIST_HEAD(ret_folios);
+	struct migrate_pages_stats stats;
+
+	trace_mm_migrate_pages_start(mode, reason);
+
+	memset(&stats, 0, sizeof(stats));
+
+	rc_gether = migrate_hugetlbs(from, get_new_page, put_new_page, private,
+				     mode, reason, &stats, &ret_folios);
+	if (rc_gether < 0)
+		goto out;
+again:
+	nr_pages = 0;
+	list_for_each_entry_safe(folio, folio2, from, lru) {
+		if (folio_test_hugetlb(folio)) {
+			list_move_tail(&folio->lru, &ret_folios);
+			continue;
+		}
+
+		nr_pages += folio_nr_pages(folio);
+		if (nr_pages > NR_MAX_BATCHED_MIGRATION)
+			break;
+	}
+	if (nr_pages > NR_MAX_BATCHED_MIGRATION)
+		list_cut_before(&folios, from, &folio->lru);
+	else
+		list_splice_init(from, &folios);
+	rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private,
+				 mode, reason, &ret_folios, &stats);
+	list_splice_tail_init(&folios, &ret_folios);
+	if (rc < 0) {
+		rc_gether = rc;
+		goto out;
+	}
+	rc_gether += rc;
+	if (!list_empty(from))
+		goto again;
 out:
 	/*
 	 * Put the permanent failure folio back to migration list, they
@@ -1713,7 +1752,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 	 * are migrated successfully.
 	 */
 	if (list_empty(from))
-		rc = 0;
+		rc_gether = 0;
 
 	count_vm_events(PGMIGRATE_SUCCESS, stats.nr_succeeded);
 	count_vm_events(PGMIGRATE_FAIL, stats.nr_failed_pages);
@@ -1727,7 +1766,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 	if (ret_succeeded)
 		*ret_succeeded = stats.nr_succeeded;
 
-	return rc;
+	return rc_gether;
 }
 
 struct page *alloc_migration_target(struct page *page, unsigned long private)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/8] migrate_pages: split unmap_and_move() to _unmap() and _move()
  2022-12-27  0:28 [PATCH 0/8] migrate_pages(): batch TLB flushing Huang Ying
                   ` (2 preceding siblings ...)
  2022-12-27  0:28 ` [PATCH 3/8] migrate_pages: restrict number of pages to migrate in batch Huang Ying
@ 2022-12-27  0:28 ` Huang Ying
  2023-01-03 18:55   ` Zi Yan
  2023-01-05 18:26   ` Nathan Chancellor
  2022-12-27  0:28 ` [PATCH 5/8] migrate_pages: batch _unmap and _move Huang Ying
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 39+ messages in thread
From: Huang Ying @ 2022-12-27  0:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	Alistair Popple, haoxin

This is a preparation patch to batch the folio unmapping and moving.

In this patch, unmap_and_move() is split to migrate_folio_unmap() and
migrate_folio_move().  So, we can batch _unmap() and _move() in
different loops later.  To pass some information between unmap and
move, the original unused dst->mapping and dst->private are used.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: haoxin <xhao@linux.alibaba.com>
---
 include/linux/migrate.h |   1 +
 mm/migrate.c            | 162 +++++++++++++++++++++++++++++-----------
 2 files changed, 121 insertions(+), 42 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3ef77f52a4f0..7376074f2e1e 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -18,6 +18,7 @@ struct migration_target_control;
  * - zero on page migration success;
  */
 #define MIGRATEPAGE_SUCCESS		0
+#define MIGRATEPAGE_UNMAP		1
 
 /**
  * struct movable_operations - Driver page migration
diff --git a/mm/migrate.c b/mm/migrate.c
index 97ea0737ab2b..e2383b430932 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1009,11 +1009,29 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
 	return rc;
 }
 
-static int __unmap_and_move(struct folio *src, struct folio *dst,
+static void __migrate_folio_record(struct folio *dst,
+				   unsigned long page_was_mapped,
+				   struct anon_vma *anon_vma)
+{
+	dst->mapping = (struct address_space *)anon_vma;
+	dst->private = (void *)page_was_mapped;
+}
+
+static void __migrate_folio_extract(struct folio *dst,
+				   int *page_was_mappedp,
+				   struct anon_vma **anon_vmap)
+{
+	*anon_vmap = (struct anon_vma *)dst->mapping;
+	*page_was_mappedp = (unsigned long)dst->private;
+	dst->mapping = NULL;
+	dst->private = NULL;
+}
+
+static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
 				int force, enum migrate_mode mode)
 {
 	int rc = -EAGAIN;
-	bool page_was_mapped = false;
+	int page_was_mapped = 0;
 	struct anon_vma *anon_vma = NULL;
 	bool is_lru = !__PageMovable(&src->page);
 
@@ -1089,8 +1107,8 @@ static int __unmap_and_move(struct folio *src, struct folio *dst,
 		goto out_unlock;
 
 	if (unlikely(!is_lru)) {
-		rc = move_to_new_folio(dst, src, mode);
-		goto out_unlock_both;
+		__migrate_folio_record(dst, page_was_mapped, anon_vma);
+		return MIGRATEPAGE_UNMAP;
 	}
 
 	/*
@@ -1115,11 +1133,40 @@ static int __unmap_and_move(struct folio *src, struct folio *dst,
 		VM_BUG_ON_FOLIO(folio_test_anon(src) &&
 			       !folio_test_ksm(src) && !anon_vma, src);
 		try_to_migrate(src, 0);
-		page_was_mapped = true;
+		page_was_mapped = 1;
 	}
 
-	if (!folio_mapped(src))
-		rc = move_to_new_folio(dst, src, mode);
+	if (!folio_mapped(src)) {
+		__migrate_folio_record(dst, page_was_mapped, anon_vma);
+		return MIGRATEPAGE_UNMAP;
+	}
+
+
+	if (page_was_mapped)
+		remove_migration_ptes(src, src, false);
+
+out_unlock_both:
+	folio_unlock(dst);
+out_unlock:
+	/* Drop an anon_vma reference if we took one */
+	if (anon_vma)
+		put_anon_vma(anon_vma);
+	folio_unlock(src);
+out:
+
+	return rc;
+}
+
+static int __migrate_folio_move(struct folio *src, struct folio *dst,
+				enum migrate_mode mode)
+{
+	int rc;
+	int page_was_mapped = 0;
+	struct anon_vma *anon_vma = NULL;
+
+	__migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
+
+	rc = move_to_new_folio(dst, src, mode);
 
 	/*
 	 * When successful, push dst to LRU immediately: so that if it
@@ -1140,14 +1187,11 @@ static int __unmap_and_move(struct folio *src, struct folio *dst,
 		remove_migration_ptes(src,
 			rc == MIGRATEPAGE_SUCCESS ? dst : src, false);
 
-out_unlock_both:
 	folio_unlock(dst);
-out_unlock:
 	/* Drop an anon_vma reference if we took one */
 	if (anon_vma)
 		put_anon_vma(anon_vma);
 	folio_unlock(src);
-out:
 	/*
 	 * If migration is successful, decrease refcount of dst,
 	 * which will not free the page because new page owner increased
@@ -1159,19 +1203,32 @@ static int __unmap_and_move(struct folio *src, struct folio *dst,
 	return rc;
 }
 
-/*
- * Obtain the lock on folio, remove all ptes and migrate the folio
- * to the newly allocated folio in dst.
- */
-static int unmap_and_move(new_page_t get_new_page,
-				   free_page_t put_new_page,
-				   unsigned long private, struct folio *src,
-				   int force, enum migrate_mode mode,
-				   enum migrate_reason reason,
-				   struct list_head *ret)
+static void migrate_folio_done(struct folio *src,
+			       enum migrate_reason reason)
+{
+	/*
+	 * Compaction can migrate also non-LRU pages which are
+	 * not accounted to NR_ISOLATED_*. They can be recognized
+	 * as __PageMovable
+	 */
+	if (likely(!__folio_test_movable(src)))
+		mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON +
+				    folio_is_file_lru(src), -folio_nr_pages(src));
+
+	if (reason != MR_MEMORY_FAILURE)
+		/* We release the page in page_handle_poison. */
+		folio_put(src);
+}
+
+/* Obtain the lock on page, remove all ptes. */
+static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
+			       unsigned long private, struct folio *src,
+			       struct folio **dstp, int force,
+			       enum migrate_mode mode, enum migrate_reason reason,
+			       struct list_head *ret)
 {
 	struct folio *dst;
-	int rc = MIGRATEPAGE_SUCCESS;
+	int rc = MIGRATEPAGE_UNMAP;
 	struct page *newpage = NULL;
 
 	if (!thp_migration_supported() && folio_test_transhuge(src))
@@ -1182,20 +1239,50 @@ static int unmap_and_move(new_page_t get_new_page,
 		folio_clear_active(src);
 		folio_clear_unevictable(src);
 		/* free_pages_prepare() will clear PG_isolated. */
-		goto out;
+		list_del(&src->lru);
+		migrate_folio_done(src, reason);
+		return MIGRATEPAGE_SUCCESS;
 	}
 
 	newpage = get_new_page(&src->page, private);
 	if (!newpage)
 		return -ENOMEM;
 	dst = page_folio(newpage);
+	*dstp = dst;
 
 	dst->private = NULL;
-	rc = __unmap_and_move(src, dst, force, mode);
+	rc = __migrate_folio_unmap(src, dst, force, mode);
+	if (rc == MIGRATEPAGE_UNMAP)
+		return rc;
+
+	/*
+	 * A page that has not been migrated will have kept its
+	 * references and be restored.
+	 */
+	/* restore the folio to right list. */
+	if (rc != -EAGAIN)
+		list_move_tail(&src->lru, ret);
+
+	if (put_new_page)
+		put_new_page(&dst->page, private);
+	else
+		folio_put(dst);
+
+	return rc;
+}
+
+/* Migrate the folio to the newly allocated folio in dst. */
+static int migrate_folio_move(free_page_t put_new_page, unsigned long private,
+			      struct folio *src, struct folio *dst,
+			      enum migrate_mode mode, enum migrate_reason reason,
+			      struct list_head *ret)
+{
+	int rc;
+
+	rc = __migrate_folio_move(src, dst, mode);
 	if (rc == MIGRATEPAGE_SUCCESS)
 		set_page_owner_migrate_reason(&dst->page, reason);
 
-out:
 	if (rc != -EAGAIN) {
 		/*
 		 * A folio that has been migrated has all references
@@ -1211,20 +1298,7 @@ static int unmap_and_move(new_page_t get_new_page,
 	 * we want to retry.
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
-		/*
-		 * Compaction can migrate also non-LRU folios which are
-		 * not accounted to NR_ISOLATED_*. They can be recognized
-		 * as __folio_test_movable
-		 */
-		if (likely(!__folio_test_movable(src)))
-			mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON +
-					folio_is_file_lru(src), -folio_nr_pages(src));
-
-		if (reason != MR_MEMORY_FAILURE)
-			/*
-			 * We release the folio in page_handle_poison.
-			 */
-			folio_put(src);
+		migrate_folio_done(src, reason);
 	} else {
 		if (rc != -EAGAIN)
 			list_add_tail(&src->lru, ret);
@@ -1499,7 +1573,7 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
 	int pass = 0;
 	bool is_large = false;
 	bool is_thp = false;
-	struct folio *folio, *folio2;
+	struct folio *folio, *folio2, *dst = NULL;
 	int rc, nr_pages;
 	LIST_HEAD(split_folios);
 	bool nosplit = (reason == MR_NUMA_MISPLACED);
@@ -1524,9 +1598,13 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
 
 			cond_resched();
 
-			rc = unmap_and_move(get_new_page, put_new_page,
-					    private, folio, pass > 2, mode,
-					    reason, ret_folios);
+			rc = migrate_folio_unmap(get_new_page, put_new_page, private,
+						 folio, &dst, pass > 2, mode,
+						 reason, ret_folios);
+			if (rc == MIGRATEPAGE_UNMAP)
+				rc = migrate_folio_move(put_new_page, private,
+							folio, dst, mode,
+							reason, ret_folios);
 			/*
 			 * The rules are:
 			 *	Success: folio will be freed
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/8] migrate_pages: batch _unmap and _move
  2022-12-27  0:28 [PATCH 0/8] migrate_pages(): batch TLB flushing Huang Ying
                   ` (3 preceding siblings ...)
  2022-12-27  0:28 ` [PATCH 4/8] migrate_pages: split unmap_and_move() to _unmap() and _move() Huang Ying
@ 2022-12-27  0:28 ` Huang Ying
  2022-12-28 23:22   ` Andrew Morton
  2023-01-03 19:01   ` Zi Yan
  2022-12-27  0:28 ` [PATCH 6/8] migrate_pages: move migrate_folio_done() and migrate_folio_unmap() Huang Ying
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 39+ messages in thread
From: Huang Ying @ 2022-12-27  0:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	Alistair Popple, haoxin

In this patch the _unmap and _move stage of the folio migration is
batched.  That for, previously, it is,

  for each folio
    _unmap()
    _move()

Now, it is,

  for each folio
    _unmap()
  for each folio
    _move()

Based on this, we can batch the TLB flushing and use some hardware
accelerator to copy folios between batched _unmap and batched _move
stages.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: haoxin <xhao@linux.alibaba.com>
---
 mm/migrate.c | 189 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 165 insertions(+), 24 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index e2383b430932..dd68c3de3da8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1027,8 +1027,32 @@ static void __migrate_folio_extract(struct folio *dst,
 	dst->private = NULL;
 }
 
+static void migrate_folio_undo_src(struct folio *src,
+				   int page_was_mapped,
+				   struct anon_vma *anon_vma,
+				   struct list_head *ret)
+{
+	if (page_was_mapped)
+		remove_migration_ptes(src, src, false);
+	if (anon_vma)
+		put_anon_vma(anon_vma);
+	folio_unlock(src);
+	list_move_tail(&src->lru, ret);
+}
+
+static void migrate_folio_undo_dst(struct folio *dst,
+				   free_page_t put_new_page,
+				   unsigned long private)
+{
+	folio_unlock(dst);
+	if (put_new_page)
+		put_new_page(&dst->page, private);
+	else
+		folio_put(dst);
+}
+
 static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
-				int force, enum migrate_mode mode)
+				 int force, bool force_lock, enum migrate_mode mode)
 {
 	int rc = -EAGAIN;
 	int page_was_mapped = 0;
@@ -1055,6 +1079,11 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
 		if (current->flags & PF_MEMALLOC)
 			goto out;
 
+		if (!force_lock) {
+			rc = -EDEADLOCK;
+			goto out;
+		}
+
 		folio_lock(src);
 	}
 
@@ -1168,6 +1197,8 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst,
 
 	rc = move_to_new_folio(dst, src, mode);
 
+	if (rc != -EAGAIN)
+		list_del(&dst->lru);
 	/*
 	 * When successful, push dst to LRU immediately: so that if it
 	 * turns out to be an mlocked page, remove_migration_ptes() will
@@ -1183,6 +1214,11 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst,
 			lru_add_drain();
 	}
 
+	if (rc == -EAGAIN) {
+		__migrate_folio_record(dst, page_was_mapped, anon_vma);
+		return rc;
+	}
+
 	if (page_was_mapped)
 		remove_migration_ptes(src,
 			rc == MIGRATEPAGE_SUCCESS ? dst : src, false);
@@ -1223,7 +1259,7 @@ static void migrate_folio_done(struct folio *src,
 /* Obtain the lock on page, remove all ptes. */
 static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
 			       unsigned long private, struct folio *src,
-			       struct folio **dstp, int force,
+			       struct folio **dstp, int force, bool force_lock,
 			       enum migrate_mode mode, enum migrate_reason reason,
 			       struct list_head *ret)
 {
@@ -1251,7 +1287,7 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
 	*dstp = dst;
 
 	dst->private = NULL;
-	rc = __migrate_folio_unmap(src, dst, force, mode);
+	rc = __migrate_folio_unmap(src, dst, force, force_lock, mode);
 	if (rc == MIGRATEPAGE_UNMAP)
 		return rc;
 
@@ -1260,7 +1296,7 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
 	 * references and be restored.
 	 */
 	/* restore the folio to right list. */
-	if (rc != -EAGAIN)
+	if (rc != -EAGAIN && rc != -EDEADLOCK)
 		list_move_tail(&src->lru, ret);
 
 	if (put_new_page)
@@ -1299,9 +1335,8 @@ static int migrate_folio_move(free_page_t put_new_page, unsigned long private,
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
 		migrate_folio_done(src, reason);
-	} else {
-		if (rc != -EAGAIN)
-			list_add_tail(&src->lru, ret);
+	} else if (rc != -EAGAIN) {
+		list_add_tail(&src->lru, ret);
 
 		if (put_new_page)
 			put_new_page(&dst->page, private);
@@ -1564,7 +1599,7 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
 		enum migrate_mode mode, int reason, struct list_head *ret_folios,
 		struct migrate_pages_stats *stats)
 {
-	int retry = 1;
+	int retry;
 	int large_retry = 1;
 	int thp_retry = 1;
 	int nr_failed = 0;
@@ -1573,13 +1608,19 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
 	int pass = 0;
 	bool is_large = false;
 	bool is_thp = false;
-	struct folio *folio, *folio2, *dst = NULL;
-	int rc, nr_pages;
+	struct folio *folio, *folio2, *dst = NULL, *dst2;
+	int rc, rc_saved, nr_pages;
 	LIST_HEAD(split_folios);
+	LIST_HEAD(unmap_folios);
+	LIST_HEAD(dst_folios);
 	bool nosplit = (reason == MR_NUMA_MISPLACED);
 	bool no_split_folio_counting = false;
+	bool force_lock;
 
-split_folio_migration:
+retry:
+	rc_saved = 0;
+	force_lock = true;
+	retry = 1;
 	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
 		retry = 0;
 		large_retry = 0;
@@ -1599,16 +1640,15 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
 			cond_resched();
 
 			rc = migrate_folio_unmap(get_new_page, put_new_page, private,
-						 folio, &dst, pass > 2, mode,
-						 reason, ret_folios);
-			if (rc == MIGRATEPAGE_UNMAP)
-				rc = migrate_folio_move(put_new_page, private,
-							folio, dst, mode,
-							reason, ret_folios);
+						 folio, &dst, pass > 2, force_lock,
+						 mode, reason, ret_folios);
 			/*
 			 * The rules are:
 			 *	Success: folio will be freed
+			 *	Unmap: folio will be put on unmap_folios list,
+			 *	       dst folio put on dst_folios list
 			 *	-EAGAIN: stay on the from list
+			 *	-EDEADLOCK: stay on the from list
 			 *	-ENOMEM: stay on the from list
 			 *	-ENOSYS: stay on the from list
 			 *	Other errno: put on ret_folios list
@@ -1643,7 +1683,7 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
 			case -ENOMEM:
 				/*
 				 * When memory is low, don't bother to try to migrate
-				 * other folios, just exit.
+				 * other folios, move unmapped folios, then exit.
 				 */
 				if (is_large) {
 					nr_large_failed++;
@@ -1682,7 +1722,14 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
 				/* nr_failed isn't updated for not used */
 				nr_large_failed += large_retry;
 				stats->nr_thp_failed += thp_retry;
-				goto out;
+				rc_saved = rc;
+				if (list_empty(&unmap_folios))
+					goto out;
+				else
+					goto move;
+			case -EDEADLOCK:
+				rc_saved = rc;
+				goto move;
 			case -EAGAIN:
 				if (is_large) {
 					large_retry++;
@@ -1696,6 +1743,15 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
 				stats->nr_succeeded += nr_pages;
 				stats->nr_thp_succeeded += is_thp;
 				break;
+			case MIGRATEPAGE_UNMAP:
+				/*
+				 * We have locked some pages, don't force lock
+				 * to avoid deadlock.
+				 */
+				force_lock = false;
+				list_move_tail(&folio->lru, &unmap_folios);
+				list_add_tail(&dst->lru, &dst_folios);
+				break;
 			default:
 				/*
 				 * Permanent failure (-EBUSY, etc.):
@@ -1719,12 +1775,93 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
 	nr_large_failed += large_retry;
 	stats->nr_thp_failed += thp_retry;
 	stats->nr_failed_pages += nr_retry_pages;
+move:
+	retry = 1;
+	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
+		retry = 0;
+		large_retry = 0;
+		thp_retry = 0;
+		nr_retry_pages = 0;
+
+		dst = list_first_entry(&dst_folios, struct folio, lru);
+		dst2 = list_next_entry(dst, lru);
+		list_for_each_entry_safe(folio, folio2, &unmap_folios, lru) {
+			is_large = folio_test_large(folio);
+			is_thp = is_large && folio_test_pmd_mappable(folio);
+			nr_pages = folio_nr_pages(folio);
+
+			cond_resched();
+
+			rc = migrate_folio_move(put_new_page, private,
+						folio, dst, mode,
+						reason, ret_folios);
+			/*
+			 * The rules are:
+			 *	Success: folio will be freed
+			 *	-EAGAIN: stay on the unmap_folios list
+			 *	Other errno: put on ret_folios list
+			 */
+			switch(rc) {
+			case -EAGAIN:
+				if (is_large) {
+					large_retry++;
+					thp_retry += is_thp;
+				} else if (!no_split_folio_counting) {
+					retry++;
+				}
+				nr_retry_pages += nr_pages;
+				break;
+			case MIGRATEPAGE_SUCCESS:
+				stats->nr_succeeded += nr_pages;
+				stats->nr_thp_succeeded += is_thp;
+				break;
+			default:
+				if (is_large) {
+					nr_large_failed++;
+					stats->nr_thp_failed += is_thp;
+				} else if (!no_split_folio_counting) {
+					nr_failed++;
+				}
+
+				stats->nr_failed_pages += nr_pages;
+				break;
+			}
+			dst = dst2;
+			dst2 = list_next_entry(dst, lru);
+		}
+	}
+	nr_failed += retry;
+	nr_large_failed += large_retry;
+	stats->nr_thp_failed += thp_retry;
+	stats->nr_failed_pages += nr_retry_pages;
+
+	if (rc_saved)
+		rc = rc_saved;
+	else
+		rc = nr_failed + nr_large_failed;
+out:
+	/* Cleanup remaining folios */
+	dst = list_first_entry(&dst_folios, struct folio, lru);
+	dst2 = list_next_entry(dst, lru);
+	list_for_each_entry_safe(folio, folio2, &unmap_folios, lru) {
+		int page_was_mapped = 0;
+		struct anon_vma *anon_vma = NULL;
+
+		__migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
+		migrate_folio_undo_src(folio, page_was_mapped, anon_vma,
+				       ret_folios);
+		list_del(&dst->lru);
+		migrate_folio_undo_dst(dst, put_new_page, private);
+		dst = dst2;
+		dst2 = list_next_entry(dst, lru);
+	}
+
 	/*
 	 * Try to migrate split folios of fail-to-migrate large folios, no
 	 * nr_failed counting in this round, since all split folios of a
 	 * large folio is counted as 1 failure in the first round.
 	 */
-	if (!list_empty(&split_folios)) {
+	if (rc >= 0 && !list_empty(&split_folios)) {
 		/*
 		 * Move non-migrated folios (after 10 retries) to ret_folios
 		 * to avoid migrating them again.
@@ -1732,12 +1869,16 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
 		list_splice_init(from, ret_folios);
 		list_splice_init(&split_folios, from);
 		no_split_folio_counting = true;
-		retry = 1;
-		goto split_folio_migration;
+		goto retry;
 	}
 
-	rc = nr_failed + nr_large_failed;
-out:
+	/*
+	 * We have unlocked all locked pages, so we can force lock now, let's
+	 * try again.
+	 */
+	if (rc == -EDEADLOCK)
+		goto retry;
+
 	return rc;
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/8] migrate_pages: move migrate_folio_done() and migrate_folio_unmap()
  2022-12-27  0:28 [PATCH 0/8] migrate_pages(): batch TLB flushing Huang Ying
                   ` (4 preceding siblings ...)
  2022-12-27  0:28 ` [PATCH 5/8] migrate_pages: batch _unmap and _move Huang Ying
@ 2022-12-27  0:28 ` Huang Ying
  2023-01-03 19:02   ` Zi Yan
  2022-12-27  0:28 ` [PATCH 7/8] migrate_pages: share more code between _unmap and _move Huang Ying
  2022-12-27  0:28 ` [PATCH 8/8] migrate_pages: batch flushing TLB Huang Ying
  7 siblings, 1 reply; 39+ messages in thread
From: Huang Ying @ 2022-12-27  0:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	Alistair Popple, haoxin

Just move the position of 2 functions.  There's no any functionality
change.  This is to make it easier to review the next patch via
putting code near its position in the next patch.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: haoxin <xhao@linux.alibaba.com>
---
 mm/migrate.c | 136 +++++++++++++++++++++++++--------------------------
 1 file changed, 68 insertions(+), 68 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index dd68c3de3da8..70b987391296 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1051,6 +1051,23 @@ static void migrate_folio_undo_dst(struct folio *dst,
 		folio_put(dst);
 }
 
+static void migrate_folio_done(struct folio *src,
+			       enum migrate_reason reason)
+{
+	/*
+	 * Compaction can migrate also non-LRU pages which are
+	 * not accounted to NR_ISOLATED_*. They can be recognized
+	 * as __PageMovable
+	 */
+	if (likely(!__folio_test_movable(src)))
+		mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON +
+				    folio_is_file_lru(src), -folio_nr_pages(src));
+
+	if (reason != MR_MEMORY_FAILURE)
+		/* We release the page in page_handle_poison. */
+		folio_put(src);
+}
+
 static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
 				 int force, bool force_lock, enum migrate_mode mode)
 {
@@ -1186,6 +1203,57 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
 	return rc;
 }
 
+/* Obtain the lock on page, remove all ptes. */
+static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
+			       unsigned long private, struct folio *src,
+			       struct folio **dstp, int force, bool force_lock,
+			       enum migrate_mode mode, enum migrate_reason reason,
+			       struct list_head *ret)
+{
+	struct folio *dst;
+	int rc = MIGRATEPAGE_UNMAP;
+	struct page *newpage = NULL;
+
+	if (!thp_migration_supported() && folio_test_transhuge(src))
+		return -ENOSYS;
+
+	if (folio_ref_count(src) == 1) {
+		/* Folio was freed from under us. So we are done. */
+		folio_clear_active(src);
+		folio_clear_unevictable(src);
+		/* free_pages_prepare() will clear PG_isolated. */
+		list_del(&src->lru);
+		migrate_folio_done(src, reason);
+		return MIGRATEPAGE_SUCCESS;
+	}
+
+	newpage = get_new_page(&src->page, private);
+	if (!newpage)
+		return -ENOMEM;
+	dst = page_folio(newpage);
+	*dstp = dst;
+
+	dst->private = NULL;
+	rc = __migrate_folio_unmap(src, dst, force, force_lock, mode);
+	if (rc == MIGRATEPAGE_UNMAP)
+		return rc;
+
+	/*
+	 * A page that has not been migrated will have kept its
+	 * references and be restored.
+	 */
+	/* restore the folio to right list. */
+	if (rc != -EAGAIN && rc != -EDEADLOCK)
+		list_move_tail(&src->lru, ret);
+
+	if (put_new_page)
+		put_new_page(&dst->page, private);
+	else
+		folio_put(dst);
+
+	return rc;
+}
+
 static int __migrate_folio_move(struct folio *src, struct folio *dst,
 				enum migrate_mode mode)
 {
@@ -1239,74 +1307,6 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst,
 	return rc;
 }
 
-static void migrate_folio_done(struct folio *src,
-			       enum migrate_reason reason)
-{
-	/*
-	 * Compaction can migrate also non-LRU pages which are
-	 * not accounted to NR_ISOLATED_*. They can be recognized
-	 * as __PageMovable
-	 */
-	if (likely(!__folio_test_movable(src)))
-		mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON +
-				    folio_is_file_lru(src), -folio_nr_pages(src));
-
-	if (reason != MR_MEMORY_FAILURE)
-		/* We release the page in page_handle_poison. */
-		folio_put(src);
-}
-
-/* Obtain the lock on page, remove all ptes. */
-static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
-			       unsigned long private, struct folio *src,
-			       struct folio **dstp, int force, bool force_lock,
-			       enum migrate_mode mode, enum migrate_reason reason,
-			       struct list_head *ret)
-{
-	struct folio *dst;
-	int rc = MIGRATEPAGE_UNMAP;
-	struct page *newpage = NULL;
-
-	if (!thp_migration_supported() && folio_test_transhuge(src))
-		return -ENOSYS;
-
-	if (folio_ref_count(src) == 1) {
-		/* Folio was freed from under us. So we are done. */
-		folio_clear_active(src);
-		folio_clear_unevictable(src);
-		/* free_pages_prepare() will clear PG_isolated. */
-		list_del(&src->lru);
-		migrate_folio_done(src, reason);
-		return MIGRATEPAGE_SUCCESS;
-	}
-
-	newpage = get_new_page(&src->page, private);
-	if (!newpage)
-		return -ENOMEM;
-	dst = page_folio(newpage);
-	*dstp = dst;
-
-	dst->private = NULL;
-	rc = __migrate_folio_unmap(src, dst, force, force_lock, mode);
-	if (rc == MIGRATEPAGE_UNMAP)
-		return rc;
-
-	/*
-	 * A page that has not been migrated will have kept its
-	 * references and be restored.
-	 */
-	/* restore the folio to right list. */
-	if (rc != -EAGAIN && rc != -EDEADLOCK)
-		list_move_tail(&src->lru, ret);
-
-	if (put_new_page)
-		put_new_page(&dst->page, private);
-	else
-		folio_put(dst);
-
-	return rc;
-}
-
 /* Migrate the folio to the newly allocated folio in dst. */
 static int migrate_folio_move(free_page_t put_new_page, unsigned long private,
 			      struct folio *src, struct folio *dst,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 7/8] migrate_pages: share more code between _unmap and _move
  2022-12-27  0:28 [PATCH 0/8] migrate_pages(): batch TLB flushing Huang Ying
                   ` (5 preceding siblings ...)
  2022-12-27  0:28 ` [PATCH 6/8] migrate_pages: move migrate_folio_done() and migrate_folio_unmap() Huang Ying
@ 2022-12-27  0:28 ` Huang Ying
  2023-01-04  7:12   ` Alistair Popple
  2022-12-27  0:28 ` [PATCH 8/8] migrate_pages: batch flushing TLB Huang Ying
  7 siblings, 1 reply; 39+ messages in thread
From: Huang Ying @ 2022-12-27  0:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	Alistair Popple, haoxin

This is a code cleanup patch to reduce the duplicated code between the
_unmap and _move stages of migrate_pages().  No functionality change
is expected.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: haoxin <xhao@linux.alibaba.com>
---
 mm/migrate.c | 208 ++++++++++++++++++++-------------------------------
 1 file changed, 82 insertions(+), 126 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 70b987391296..70a40b8fee1f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1030,21 +1030,26 @@ static void __migrate_folio_extract(struct folio *dst,
 static void migrate_folio_undo_src(struct folio *src,
 				   int page_was_mapped,
 				   struct anon_vma *anon_vma,
+				   bool locked,
 				   struct list_head *ret)
 {
 	if (page_was_mapped)
 		remove_migration_ptes(src, src, false);
 	if (anon_vma)
 		put_anon_vma(anon_vma);
-	folio_unlock(src);
-	list_move_tail(&src->lru, ret);
+	if (locked)
+		folio_unlock(src);
+	if (ret)
+		list_move_tail(&src->lru, ret);
 }
 
 static void migrate_folio_undo_dst(struct folio *dst,
+				   bool locked,
 				   free_page_t put_new_page,
 				   unsigned long private)
 {
-	folio_unlock(dst);
+	if (locked)
+		folio_unlock(dst);
 	if (put_new_page)
 		put_new_page(&dst->page, private);
 	else
@@ -1068,14 +1073,44 @@ static void migrate_folio_done(struct folio *src,
 		folio_put(src);
 }
 
-static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
-				 int force, bool force_lock, enum migrate_mode mode)
+/* Obtain the lock on page, remove all ptes. */
+static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
+			       unsigned long private, struct folio *src,
+			       struct folio **dstp, int force, bool force_lock,
+			       enum migrate_mode mode, enum migrate_reason reason,
+			       struct list_head *ret)
 {
-	int rc = -EAGAIN;
+	struct folio *dst;
+	int rc = MIGRATEPAGE_UNMAP;
+	struct page *newpage = NULL;
 	int page_was_mapped = 0;
 	struct anon_vma *anon_vma = NULL;
 	bool is_lru = !__PageMovable(&src->page);
+	bool locked = false;
+	bool dst_locked = false;
+
+	if (!thp_migration_supported() && folio_test_transhuge(src))
+		return -ENOSYS;
+
+	if (folio_ref_count(src) == 1) {
+		/* Folio was freed from under us. So we are done. */
+		folio_clear_active(src);
+		folio_clear_unevictable(src);
+		/* free_pages_prepare() will clear PG_isolated. */
+		list_del(&src->lru);
+		migrate_folio_done(src, reason);
+		return MIGRATEPAGE_SUCCESS;
+	}
+
+	newpage = get_new_page(&src->page, private);
+	if (!newpage)
+		return -ENOMEM;
+	dst = page_folio(newpage);
+	*dstp = dst;
+
+	dst->private = NULL;
 
+	rc = -EAGAIN;
 	if (!folio_trylock(src)) {
 		if (!force || mode == MIGRATE_ASYNC)
 			goto out;
@@ -1103,6 +1138,7 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
 
 		folio_lock(src);
 	}
+	locked = true;
 
 	if (folio_test_writeback(src)) {
 		/*
@@ -1117,10 +1153,10 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
 			break;
 		default:
 			rc = -EBUSY;
-			goto out_unlock;
+			goto out;
 		}
 		if (!force)
-			goto out_unlock;
+			goto out;
 		folio_wait_writeback(src);
 	}
 
@@ -1150,7 +1186,8 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
 	 * This is much like races on refcount of oldpage: just don't BUG().
 	 */
 	if (unlikely(!folio_trylock(dst)))
-		goto out_unlock;
+		goto out;
+	dst_locked = true;
 
 	if (unlikely(!is_lru)) {
 		__migrate_folio_record(dst, page_was_mapped, anon_vma);
@@ -1172,7 +1209,7 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
 	if (!src->mapping) {
 		if (folio_test_private(src)) {
 			try_to_free_buffers(src);
-			goto out_unlock_both;
+			goto out;
 		}
 	} else if (folio_mapped(src)) {
 		/* Establish migration ptes */
@@ -1187,75 +1224,27 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
 		return MIGRATEPAGE_UNMAP;
 	}
 
-
-	if (page_was_mapped)
-		remove_migration_ptes(src, src, false);
-
-out_unlock_both:
-	folio_unlock(dst);
-out_unlock:
-	/* Drop an anon_vma reference if we took one */
-	if (anon_vma)
-		put_anon_vma(anon_vma);
-	folio_unlock(src);
 out:
-
-	return rc;
-}
-
-/* Obtain the lock on page, remove all ptes. */
-static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
-			       unsigned long private, struct folio *src,
-			       struct folio **dstp, int force, bool force_lock,
-			       enum migrate_mode mode, enum migrate_reason reason,
-			       struct list_head *ret)
-{
-	struct folio *dst;
-	int rc = MIGRATEPAGE_UNMAP;
-	struct page *newpage = NULL;
-
-	if (!thp_migration_supported() && folio_test_transhuge(src))
-		return -ENOSYS;
-
-	if (folio_ref_count(src) == 1) {
-		/* Folio was freed from under us. So we are done. */
-		folio_clear_active(src);
-		folio_clear_unevictable(src);
-		/* free_pages_prepare() will clear PG_isolated. */
-		list_del(&src->lru);
-		migrate_folio_done(src, reason);
-		return MIGRATEPAGE_SUCCESS;
-	}
-
-	newpage = get_new_page(&src->page, private);
-	if (!newpage)
-		return -ENOMEM;
-	dst = page_folio(newpage);
-	*dstp = dst;
-
-	dst->private = NULL;
-	rc = __migrate_folio_unmap(src, dst, force, force_lock, mode);
-	if (rc == MIGRATEPAGE_UNMAP)
-		return rc;
-
 	/*
 	 * A page that has not been migrated will have kept its
 	 * references and be restored.
 	 */
 	/* restore the folio to right list. */
-	if (rc != -EAGAIN && rc != -EDEADLOCK)
-		list_move_tail(&src->lru, ret);
+	if (rc == -EAGAIN || rc == -EDEADLOCK)
+		ret = NULL;
 
-	if (put_new_page)
-		put_new_page(&dst->page, private);
-	else
-		folio_put(dst);
+	migrate_folio_undo_src(src, page_was_mapped, anon_vma, locked, ret);
+	if (dst)
+		migrate_folio_undo_dst(dst, dst_locked, put_new_page, private);
 
 	return rc;
 }
 
-static int __migrate_folio_move(struct folio *src, struct folio *dst,
-				enum migrate_mode mode)
+/* Migrate the folio to the newly allocated folio in dst. */
+static int migrate_folio_move(free_page_t put_new_page, unsigned long private,
+			      struct folio *src, struct folio *dst,
+			      enum migrate_mode mode, enum migrate_reason reason,
+			      struct list_head *ret)
 {
 	int rc;
 	int page_was_mapped = 0;
@@ -1264,9 +1253,10 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst,
 	__migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
 
 	rc = move_to_new_folio(dst, src, mode);
+	if (rc)
+		goto out;
 
-	if (rc != -EAGAIN)
-		list_del(&dst->lru);
+	list_del(&dst->lru);
 	/*
 	 * When successful, push dst to LRU immediately: so that if it
 	 * turns out to be an mlocked page, remove_migration_ptes() will
@@ -1276,74 +1266,40 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst,
 	 * unsuccessful, and other cases when a page has been temporarily
 	 * isolated from the unevictable LRU: but this case is the easiest.
 	 */
-	if (rc == MIGRATEPAGE_SUCCESS) {
-		folio_add_lru(dst);
-		if (page_was_mapped)
-			lru_add_drain();
-	}
-
-	if (rc == -EAGAIN) {
-		__migrate_folio_record(dst, page_was_mapped, anon_vma);
-		return rc;
-	}
-
+	folio_add_lru(dst);
 	if (page_was_mapped)
-		remove_migration_ptes(src,
-			rc == MIGRATEPAGE_SUCCESS ? dst : src, false);
+		lru_add_drain();
 
+	if (page_was_mapped)
+		remove_migration_ptes(src, dst, false);
 	folio_unlock(dst);
-	/* Drop an anon_vma reference if we took one */
-	if (anon_vma)
-		put_anon_vma(anon_vma);
-	folio_unlock(src);
+	set_page_owner_migrate_reason(&dst->page, reason);
 	/*
 	 * If migration is successful, decrease refcount of dst,
 	 * which will not free the page because new page owner increased
 	 * refcounter.
 	 */
-	if (rc == MIGRATEPAGE_SUCCESS)
-		folio_put(dst);
-
-	return rc;
-}
-
-/* Migrate the folio to the newly allocated folio in dst. */
-static int migrate_folio_move(free_page_t put_new_page, unsigned long private,
-			      struct folio *src, struct folio *dst,
-			      enum migrate_mode mode, enum migrate_reason reason,
-			      struct list_head *ret)
-{
-	int rc;
-
-	rc = __migrate_folio_move(src, dst, mode);
-	if (rc == MIGRATEPAGE_SUCCESS)
-		set_page_owner_migrate_reason(&dst->page, reason);
-
-	if (rc != -EAGAIN) {
-		/*
-		 * A folio that has been migrated has all references
-		 * removed and will be freed. A folio that has not been
-		 * migrated will have kept its references and be restored.
-		 */
-		list_del(&src->lru);
-	}
+	folio_put(dst);
 
 	/*
-	 * If migration is successful, releases reference grabbed during
-	 * isolation. Otherwise, restore the folio to right list unless
-	 * we want to retry.
+	 * A page that has been migrated has all references removed
+	 * and will be freed.
 	 */
-	if (rc == MIGRATEPAGE_SUCCESS) {
-		migrate_folio_done(src, reason);
-	} else if (rc != -EAGAIN) {
-		list_add_tail(&src->lru, ret);
+	list_del(&src->lru);
+	migrate_folio_undo_src(src, 0, anon_vma, true, NULL);
+	migrate_folio_done(src, reason);
 
-		if (put_new_page)
-			put_new_page(&dst->page, private);
-		else
-			folio_put(dst);
+	return rc;
+out:
+	if (rc == -EAGAIN) {
+		__migrate_folio_record(dst, page_was_mapped, anon_vma);
+		return rc;
 	}
 
+	migrate_folio_undo_src(src, page_was_mapped, anon_vma, true, ret);
+	list_del(&dst->lru);
+	migrate_folio_undo_dst(dst, true, put_new_page, private);
+
 	return rc;
 }
 
@@ -1849,9 +1805,9 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
 
 		__migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
 		migrate_folio_undo_src(folio, page_was_mapped, anon_vma,
-				       ret_folios);
+				       true, ret_folios);
 		list_del(&dst->lru);
-		migrate_folio_undo_dst(dst, put_new_page, private);
+		migrate_folio_undo_dst(dst, true, put_new_page, private);
 		dst = dst2;
 		dst2 = list_next_entry(dst, lru);
 	}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 8/8] migrate_pages: batch flushing TLB
  2022-12-27  0:28 [PATCH 0/8] migrate_pages(): batch TLB flushing Huang Ying
                   ` (6 preceding siblings ...)
  2022-12-27  0:28 ` [PATCH 7/8] migrate_pages: share more code between _unmap and _move Huang Ying
@ 2022-12-27  0:28 ` Huang Ying
  2023-01-03 19:19   ` Zi Yan
  7 siblings, 1 reply; 39+ messages in thread
From: Huang Ying @ 2022-12-27  0:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	Alistair Popple, haoxin

The TLB flushing will cost quite some CPU cycles during the folio
migration in some situations.  For example, when migrate a folio of a
process with multiple active threads that run on multiple CPUs.  After
batching the _unmap and _move in migrate_pages(), the TLB flushing can
be batched easily with the existing TLB flush batching mechanism.
This patch implements that.

We use the following test case to test the patch.

On a 2-socket Intel server,

- Run pmbench memory accessing benchmark

- Run `migratepages` to migrate pages of pmbench between node 0 and
  node 1 back and forth.

With the patch, the TLB flushing IPI reduces 99.1% during the test and
the number of pages migrated successfully per second increases 291.7%.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: haoxin <xhao@linux.alibaba.com>
---
 mm/migrate.c |  4 +++-
 mm/rmap.c    | 20 +++++++++++++++++---
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 70a40b8fee1f..d7413164e748 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1215,7 +1215,7 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
 		/* Establish migration ptes */
 		VM_BUG_ON_FOLIO(folio_test_anon(src) &&
 			       !folio_test_ksm(src) && !anon_vma, src);
-		try_to_migrate(src, 0);
+		try_to_migrate(src, TTU_BATCH_FLUSH);
 		page_was_mapped = 1;
 	}
 
@@ -1732,6 +1732,8 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
 	stats->nr_thp_failed += thp_retry;
 	stats->nr_failed_pages += nr_retry_pages;
 move:
+	try_to_unmap_flush();
+
 	retry = 1;
 	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
 		retry = 0;
diff --git a/mm/rmap.c b/mm/rmap.c
index b616870a09be..2e125f3e462e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1976,7 +1976,21 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
 		} else {
 			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
 			/* Nuke the page table entry. */
-			pteval = ptep_clear_flush(vma, address, pvmw.pte);
+			if (should_defer_flush(mm, flags)) {
+				/*
+				 * We clear the PTE but do not flush so potentially
+				 * a remote CPU could still be writing to the folio.
+				 * If the entry was previously clean then the
+				 * architecture must guarantee that a clear->dirty
+				 * transition on a cached TLB entry is written through
+				 * and traps if the PTE is unmapped.
+				 */
+				pteval = ptep_get_and_clear(mm, address, pvmw.pte);
+
+				set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
+			} else {
+				pteval = ptep_clear_flush(vma, address, pvmw.pte);
+			}
 		}
 
 		/* Set the dirty flag on the folio now the pte is gone. */
@@ -2148,10 +2162,10 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags)
 
 	/*
 	 * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and
-	 * TTU_SPLIT_HUGE_PMD and TTU_SYNC flags.
+	 * TTU_SPLIT_HUGE_PMD, TTU_SYNC, and TTU_BATCH_FLUSH flags.
 	 */
 	if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD |
-					TTU_SYNC)))
+					TTU_SYNC | TTU_BATCH_FLUSH)))
 		return;
 
 	if (folio_is_zone_device(folio) &&
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/8] migrate_pages: separate hugetlb folios migration
  2022-12-27  0:28 ` [PATCH 2/8] migrate_pages: separate hugetlb folios migration Huang Ying
@ 2022-12-28 23:17   ` Andrew Morton
  2023-01-02 23:53     ` Huang, Ying
  2023-01-05  4:13   ` Alistair Popple
  1 sibling, 1 reply; 39+ messages in thread
From: Andrew Morton @ 2022-12-28 23:17 UTC (permalink / raw)
  To: Huang Ying
  Cc: linux-mm, linux-kernel, Zi Yan, Yang Shi, Baolin Wang,
	Oscar Salvador, Matthew Wilcox, Bharata B Rao, Alistair Popple,
	haoxin

On Tue, 27 Dec 2022 08:28:53 +0800 Huang Ying <ying.huang@intel.com> wrote:

> This is a preparation patch to batch the folio unmapping and moving
> for the non-hugetlb folios.  Based on that we can batch the TLB
> shootdown during the folio migration and make it possible to use some
> hardware accelerator for the folio copying.
> 
> In this patch the hugetlb folios and non-hugetlb folios migration is
> separated in migrate_pages() to make it easy to change the non-hugetlb
> folios migration implementation.
> 
> ...
>
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1404,6 +1404,87 @@ struct migrate_pages_stats {
>  	int nr_thp_split;
>  };
>  
> +static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
> +			    free_page_t put_new_page, unsigned long private,
> +			    enum migrate_mode mode, int reason,
> +			    struct migrate_pages_stats *stats,
> +			    struct list_head *ret_folios)
> +{
> +	int retry = 1;
> +	int nr_failed = 0;
> +	int nr_retry_pages = 0;
> +	int pass = 0;
> +	struct folio *folio, *folio2;
> +	int rc = 0, nr_pages;
> +
> +	for (pass = 0; pass < 10 && retry; pass++) {

Why 10?

> +		retry = 0;
> +		nr_retry_pages = 0;
> +
> +		list_for_each_entry_safe(folio, folio2, from, lru) {
> +			if (!folio_test_hugetlb(folio))
> +				continue;
> +
> +			nr_pages = folio_nr_pages(folio);
> +
> +			cond_resched();
> +
> +			rc = unmap_and_move_huge_page(get_new_page,
> +						      put_new_page, private,
> +						      &folio->page, pass > 2, mode,
> +						      reason, ret_folios);
> +			/*
> +			 * The rules are:
> +			 *	Success: hugetlb folio will be put back
> +			 *	-EAGAIN: stay on the from list
> +			 *	-ENOMEM: stay on the from list
> +			 *	-ENOSYS: stay on the from list
> +			 *	Other errno: put on ret_folios list
> +			 */
> +			switch(rc) {
> +			case -ENOSYS:
> +				/* Hugetlb migration is unsupported */
> +				nr_failed++;
> +				stats->nr_failed_pages += nr_pages;
> +				list_move_tail(&folio->lru, ret_folios);
> +				break;
> +			case -ENOMEM:
> +				/*
> +				 * When memory is low, don't bother to try to migrate
> +				 * other folios, just exit.
> +				 */
> +				nr_failed++;
> +				stats->nr_failed_pages += nr_pages;
> +				goto out;
> +			case -EAGAIN:
> +				retry++;
> +				nr_retry_pages += nr_pages;
> +				break;
> +			case MIGRATEPAGE_SUCCESS:
> +				stats->nr_succeeded += nr_pages;
> +				break;
> +			default:
> +				/*
> +				 * Permanent failure (-EBUSY, etc.):
> +				 * unlike -EAGAIN case, the failed folio is
> +				 * removed from migration folio list and not
> +				 * retried in the next outer loop.
> +				 */
> +				nr_failed++;
> +				stats->nr_failed_pages += nr_pages;
> +				break;
> +			}
> +		}
> +	}
> +out:
> +	nr_failed += retry;
> +	stats->nr_failed_pages += nr_retry_pages;
> +	if (rc != -ENOMEM)
> +		rc = nr_failed;
> +
> +	return rc;
> +}

The interpretation of the return value of this function is somewhat
unobvious.

I suggest that this function be fully commented.

Why does a retry contribute to nr_failed.  What is the interpretation
of nr_failed.  etcetera.




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/8] migrate_pages: batch _unmap and _move
  2022-12-27  0:28 ` [PATCH 5/8] migrate_pages: batch _unmap and _move Huang Ying
@ 2022-12-28 23:22   ` Andrew Morton
  2023-01-02 23:29     ` Huang, Ying
  2023-01-03 19:01   ` Zi Yan
  1 sibling, 1 reply; 39+ messages in thread
From: Andrew Morton @ 2022-12-28 23:22 UTC (permalink / raw)
  To: Huang Ying
  Cc: linux-mm, linux-kernel, Zi Yan, Yang Shi, Baolin Wang,
	Oscar Salvador, Matthew Wilcox, Bharata B Rao, Alistair Popple,
	haoxin

On Tue, 27 Dec 2022 08:28:56 +0800 Huang Ying <ying.huang@intel.com> wrote:

> In this patch the _unmap and _move stage of the folio migration is
> batched.  That for, previously, it is,
> 
>   for each folio
>     _unmap()
>     _move()
> 
> Now, it is,
> 
>   for each folio
>     _unmap()
>   for each folio
>     _move()
> 
> Based on this, we can batch the TLB flushing and use some hardware
> accelerator to copy folios between batched _unmap and batched _move
> stages.
> 
> ...
>
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1027,8 +1027,32 @@ static void __migrate_folio_extract(struct folio *dst,
>  	dst->private = NULL;
>  }
>  
> +static void migrate_folio_undo_src(struct folio *src,
> +				   int page_was_mapped,
> +				   struct anon_vma *anon_vma,
> +				   struct list_head *ret)
> +{
> +	if (page_was_mapped)
> +		remove_migration_ptes(src, src, false);
> +	if (anon_vma)
> +		put_anon_vma(anon_vma);
> +	folio_unlock(src);
> +	list_move_tail(&src->lru, ret);
> +}
> +
> +static void migrate_folio_undo_dst(struct folio *dst,
> +				   free_page_t put_new_page,
> +				   unsigned long private)
> +{
> +	folio_unlock(dst);
> +	if (put_new_page)
> +		put_new_page(&dst->page, private);
> +	else
> +		folio_put(dst);
> +}

What do the above do?  Are they so obvious that no comments are needed?


>  static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
> -				int force, enum migrate_mode mode)
> +				 int force, bool force_lock, enum migrate_mode mode)
>  {
>  	int rc = -EAGAIN;
>  	int page_was_mapped = 0;
> @@ -1055,6 +1079,11 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>  		if (current->flags & PF_MEMALLOC)
>  			goto out;
>  
> +		if (!force_lock) {
> +			rc = -EDEADLOCK;
> +			goto out;
> +		}

Please document the use of EDEADLOCK in this code.  What does it signify?

>  		folio_lock(src);
>  	}
>  


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/8] migrate_pages: batch _unmap and _move
  2022-12-28 23:22   ` Andrew Morton
@ 2023-01-02 23:29     ` Huang, Ying
  0 siblings, 0 replies; 39+ messages in thread
From: Huang, Ying @ 2023-01-02 23:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Zi Yan, Yang Shi, Baolin Wang,
	Oscar Salvador, Matthew Wilcox, Bharata B Rao, Alistair Popple,
	haoxin

Andrew Morton <akpm@linux-foundation.org> writes:

> On Tue, 27 Dec 2022 08:28:56 +0800 Huang Ying <ying.huang@intel.com> wrote:
>
>> In this patch the _unmap and _move stage of the folio migration is
>> batched.  That for, previously, it is,
>> 
>>   for each folio
>>     _unmap()
>>     _move()
>> 
>> Now, it is,
>> 
>>   for each folio
>>     _unmap()
>>   for each folio
>>     _move()
>> 
>> Based on this, we can batch the TLB flushing and use some hardware
>> accelerator to copy folios between batched _unmap and batched _move
>> stages.
>> 
>> ...
>>
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1027,8 +1027,32 @@ static void __migrate_folio_extract(struct folio *dst,
>>  	dst->private = NULL;
>>  }
>>  
>> +static void migrate_folio_undo_src(struct folio *src,
>> +				   int page_was_mapped,
>> +				   struct anon_vma *anon_vma,
>> +				   struct list_head *ret)
>> +{
>> +	if (page_was_mapped)
>> +		remove_migration_ptes(src, src, false);
>> +	if (anon_vma)
>> +		put_anon_vma(anon_vma);
>> +	folio_unlock(src);
>> +	list_move_tail(&src->lru, ret);
>> +}
>> +
>> +static void migrate_folio_undo_dst(struct folio *dst,
>> +				   free_page_t put_new_page,
>> +				   unsigned long private)
>> +{
>> +	folio_unlock(dst);
>> +	if (put_new_page)
>> +		put_new_page(&dst->page, private);
>> +	else
>> +		folio_put(dst);
>> +}
>
> What do the above do?  Are they so obvious that no comments are needed?

Thank you for reminding, will add comments.

>
>>  static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>> -				int force, enum migrate_mode mode)
>> +				 int force, bool force_lock, enum migrate_mode mode)
>>  {
>>  	int rc = -EAGAIN;
>>  	int page_was_mapped = 0;
>> @@ -1055,6 +1079,11 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>>  		if (current->flags & PF_MEMALLOC)
>>  			goto out;
>>  
>> +		if (!force_lock) {
>> +			rc = -EDEADLOCK;
>> +			goto out;
>> +		}
>
> Please document the use of EDEADLOCK in this code.  What does it signify?

Sure.  Will do that in the next version.

Best Regards,
Huang, Ying

>>  		folio_lock(src);
>>  	}
>>  

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/8] migrate_pages: separate hugetlb folios migration
  2022-12-28 23:17   ` Andrew Morton
@ 2023-01-02 23:53     ` Huang, Ying
  0 siblings, 0 replies; 39+ messages in thread
From: Huang, Ying @ 2023-01-02 23:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Zi Yan, Yang Shi, Baolin Wang,
	Oscar Salvador, Matthew Wilcox, Bharata B Rao, Alistair Popple,
	haoxin

Andrew Morton <akpm@linux-foundation.org> writes:

> On Tue, 27 Dec 2022 08:28:53 +0800 Huang Ying <ying.huang@intel.com> wrote:
>
>> This is a preparation patch to batch the folio unmapping and moving
>> for the non-hugetlb folios.  Based on that we can batch the TLB
>> shootdown during the folio migration and make it possible to use some
>> hardware accelerator for the folio copying.
>> 
>> In this patch the hugetlb folios and non-hugetlb folios migration is
>> separated in migrate_pages() to make it easy to change the non-hugetlb
>> folios migration implementation.
>> 
>> ...
>>
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1404,6 +1404,87 @@ struct migrate_pages_stats {
>>  	int nr_thp_split;
>>  };
>>  
>> +static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
>> +			    free_page_t put_new_page, unsigned long private,
>> +			    enum migrate_mode mode, int reason,
>> +			    struct migrate_pages_stats *stats,
>> +			    struct list_head *ret_folios)
>> +{
>> +	int retry = 1;
>> +	int nr_failed = 0;
>> +	int nr_retry_pages = 0;
>> +	int pass = 0;
>> +	struct folio *folio, *folio2;
>> +	int rc = 0, nr_pages;
>> +
>> +	for (pass = 0; pass < 10 && retry; pass++) {
>
> Why 10?

This is inherited from the original max pass number from
migrate_pages().  Which is introduced in commit 49d2e9cc4544 ("[PATCH]
Swap Migration V5: migrate_pages() function").  From the code and commit
message, I don't find out why.  I guess that we need some magic number
anyway.

Now, because the magic number is used in 2 places (migrate_pages() and
migrate_hugetlbs()), it's better to define it as a constant macro?

>> +		retry = 0;
>> +		nr_retry_pages = 0;
>> +
>> +		list_for_each_entry_safe(folio, folio2, from, lru) {
>> +			if (!folio_test_hugetlb(folio))
>> +				continue;
>> +
>> +			nr_pages = folio_nr_pages(folio);
>> +
>> +			cond_resched();
>> +
>> +			rc = unmap_and_move_huge_page(get_new_page,
>> +						      put_new_page, private,
>> +						      &folio->page, pass > 2, mode,
>> +						      reason, ret_folios);
>> +			/*
>> +			 * The rules are:
>> +			 *	Success: hugetlb folio will be put back
>> +			 *	-EAGAIN: stay on the from list
>> +			 *	-ENOMEM: stay on the from list
>> +			 *	-ENOSYS: stay on the from list
>> +			 *	Other errno: put on ret_folios list
>> +			 */
>> +			switch(rc) {
>> +			case -ENOSYS:
>> +				/* Hugetlb migration is unsupported */
>> +				nr_failed++;
>> +				stats->nr_failed_pages += nr_pages;
>> +				list_move_tail(&folio->lru, ret_folios);
>> +				break;
>> +			case -ENOMEM:
>> +				/*
>> +				 * When memory is low, don't bother to try to migrate
>> +				 * other folios, just exit.
>> +				 */
>> +				nr_failed++;
>> +				stats->nr_failed_pages += nr_pages;
>> +				goto out;
>> +			case -EAGAIN:
>> +				retry++;
>> +				nr_retry_pages += nr_pages;
>> +				break;
>> +			case MIGRATEPAGE_SUCCESS:
>> +				stats->nr_succeeded += nr_pages;
>> +				break;
>> +			default:
>> +				/*
>> +				 * Permanent failure (-EBUSY, etc.):
>> +				 * unlike -EAGAIN case, the failed folio is
>> +				 * removed from migration folio list and not
>> +				 * retried in the next outer loop.
>> +				 */
>> +				nr_failed++;
>> +				stats->nr_failed_pages += nr_pages;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +out:
>> +	nr_failed += retry;
>> +	stats->nr_failed_pages += nr_retry_pages;
>> +	if (rc != -ENOMEM)
>> +		rc = nr_failed;
>> +
>> +	return rc;
>> +}
>
> The interpretation of the return value of this function is somewhat
> unobvious.
>
> I suggest that this function be fully commented.
>
> Why does a retry contribute to nr_failed.  What is the interpretation
> of nr_failed.  etcetera.

Sure.  Will do that in the next version.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/8] migrate_pages: organize stats with struct migrate_pages_stats
  2022-12-27  0:28 ` [PATCH 1/8] migrate_pages: organize stats with struct migrate_pages_stats Huang Ying
@ 2023-01-03 18:06   ` Zi Yan
  2023-01-05  3:02   ` Alistair Popple
  1 sibling, 0 replies; 39+ messages in thread
From: Zi Yan @ 2023-01-03 18:06 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Baolin Wang,
	Oscar Salvador, Matthew Wilcox, Bharata B Rao, Alistair Popple,
	haoxin

[-- Attachment #1: Type: text/plain, Size: 865 bytes --]

On 26 Dec 2022, at 19:28, Huang Ying wrote:

> Define struct migrate_pages_stats to organize the various statistics
> in migrate_pages().  This makes it easier to collect and consume the
> statistics in multiple functions.  This will be needed in the
> following patches in the series.
>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Bharata B Rao <bharata@amd.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: haoxin <xhao@linux.alibaba.com>
> ---
>  mm/migrate.c | 58 +++++++++++++++++++++++++++++-----------------------
>  1 file changed, 32 insertions(+), 26 deletions(-)

LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/8] migrate_pages: restrict number of pages to migrate in batch
  2022-12-27  0:28 ` [PATCH 3/8] migrate_pages: restrict number of pages to migrate in batch Huang Ying
@ 2023-01-03 18:40   ` Zi Yan
  2023-01-04  0:24     ` Huang, Ying
  0 siblings, 1 reply; 39+ messages in thread
From: Zi Yan @ 2023-01-03 18:40 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Baolin Wang,
	Oscar Salvador, Matthew Wilcox, Bharata B Rao, Alistair Popple,
	haoxin

[-- Attachment #1: Type: text/plain, Size: 13236 bytes --]

On 26 Dec 2022, at 19:28, Huang Ying wrote:

> This is a preparation patch to batch the folio unmapping and moving
> for non-hugetlb folios.
>
> If we had batched the folio unmapping, all folios to be migrated would
> be unmapped before copying the contents and flags of the folios.  If
> the folios that were passed to migrate_pages() were too many in unit
> of pages, the execution of the processes would be stopped for too long
> time, thus too long latency.  For example, migrate_pages() syscall
> will call migrate_pages() with all folios of a process.  To avoid this
> possible issue, in this patch, we restrict the number of pages to be
> migrated to be no more than HPAGE_PMD_NR.  That is, the influence is
> at the same level of THP migration.
>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Bharata B Rao <bharata@amd.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: haoxin <xhao@linux.alibaba.com>
> ---
>  mm/migrate.c | 173 +++++++++++++++++++++++++++++++--------------------
>  1 file changed, 106 insertions(+), 67 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index bdbe73fe2eb7..97ea0737ab2b 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1485,40 +1485,15 @@ static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
>  	return rc;
>  }
>
> -/*
> - * migrate_pages - migrate the folios specified in a list, to the free folios
> - *		   supplied as the target for the page migration
> - *
> - * @from:		The list of folios to be migrated.
> - * @get_new_page:	The function used to allocate free folios to be used
> - *			as the target of the folio migration.
> - * @put_new_page:	The function used to free target folios if migration
> - *			fails, or NULL if no special handling is necessary.
> - * @private:		Private data to be passed on to get_new_page()
> - * @mode:		The migration mode that specifies the constraints for
> - *			folio migration, if any.
> - * @reason:		The reason for folio migration.
> - * @ret_succeeded:	Set to the number of folios migrated successfully if
> - *			the caller passes a non-NULL pointer.
> - *
> - * The function returns after 10 attempts or if no folios are movable any more
> - * because the list has become empty or no retryable folios exist any more.
> - * It is caller's responsibility to call putback_movable_pages() to return folios
> - * to the LRU or free list only if ret != 0.
> - *
> - * Returns the number of {normal folio, large folio, hugetlb} that were not
> - * migrated, or an error code. The number of large folio splits will be
> - * considered as the number of non-migrated large folio, no matter how many
> - * split folios of the large folio are migrated successfully.
> - */
> -int migrate_pages(struct list_head *from, new_page_t get_new_page,
> +static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>  		free_page_t put_new_page, unsigned long private,
> -		enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
> +		enum migrate_mode mode, int reason, struct list_head *ret_folios,
> +		struct migrate_pages_stats *stats)
>  {
>  	int retry = 1;
>  	int large_retry = 1;
>  	int thp_retry = 1;
> -	int nr_failed;
> +	int nr_failed = 0;
>  	int nr_retry_pages = 0;
>  	int nr_large_failed = 0;
>  	int pass = 0;
> @@ -1526,20 +1501,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  	bool is_thp = false;
>  	struct folio *folio, *folio2;
>  	int rc, nr_pages;
> -	LIST_HEAD(ret_folios);
>  	LIST_HEAD(split_folios);
>  	bool nosplit = (reason == MR_NUMA_MISPLACED);
>  	bool no_split_folio_counting = false;
> -	struct migrate_pages_stats stats;
> -
> -	trace_mm_migrate_pages_start(mode, reason);
> -
> -	memset(&stats, 0, sizeof(stats));
> -	rc = migrate_hugetlbs(from, get_new_page, put_new_page, private, mode, reason,
> -			      &stats, &ret_folios);
> -	if (rc < 0)
> -		goto out;
> -	nr_failed = rc;
>
>  split_folio_migration:
>  	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
> @@ -1549,11 +1513,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  		nr_retry_pages = 0;
>
>  		list_for_each_entry_safe(folio, folio2, from, lru) {
> -			if (folio_test_hugetlb(folio)) {
> -				list_move_tail(&folio->lru, &ret_folios);
> -				continue;
> -			}
> -
>  			/*
>  			 * Large folio statistics is based on the source large
>  			 * folio. Capture required information that might get
> @@ -1567,15 +1526,14 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>
>  			rc = unmap_and_move(get_new_page, put_new_page,
>  					    private, folio, pass > 2, mode,
> -					    reason, &ret_folios);
> +					    reason, ret_folios);
>  			/*
>  			 * The rules are:
>  			 *	Success: folio will be freed
>  			 *	-EAGAIN: stay on the from list
>  			 *	-ENOMEM: stay on the from list
>  			 *	-ENOSYS: stay on the from list
> -			 *	Other errno: put on ret_folios list then splice to
> -			 *		     from list
> +			 *	Other errno: put on ret_folios list
>  			 */
>  			switch(rc) {
>  			/*
> @@ -1592,17 +1550,17 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  				/* Large folio migration is unsupported */
>  				if (is_large) {
>  					nr_large_failed++;
> -					stats.nr_thp_failed += is_thp;
> +					stats->nr_thp_failed += is_thp;
>  					if (!try_split_folio(folio, &split_folios)) {
> -						stats.nr_thp_split += is_thp;
> +						stats->nr_thp_split += is_thp;
>  						break;
>  					}
>  				} else if (!no_split_folio_counting) {
>  					nr_failed++;
>  				}
>
> -				stats.nr_failed_pages += nr_pages;
> -				list_move_tail(&folio->lru, &ret_folios);
> +				stats->nr_failed_pages += nr_pages;
> +				list_move_tail(&folio->lru, ret_folios);
>  				break;
>  			case -ENOMEM:
>  				/*
> @@ -1611,13 +1569,13 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  				 */
>  				if (is_large) {
>  					nr_large_failed++;
> -					stats.nr_thp_failed += is_thp;
> +					stats->nr_thp_failed += is_thp;
>  					/* Large folio NUMA faulting doesn't split to retry. */
>  					if (!nosplit) {
>  						int ret = try_split_folio(folio, &split_folios);
>
>  						if (!ret) {
> -							stats.nr_thp_split += is_thp;
> +							stats->nr_thp_split += is_thp;
>  							break;
>  						} else if (reason == MR_LONGTERM_PIN &&
>  							   ret == -EAGAIN) {
> @@ -1635,17 +1593,17 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  					nr_failed++;
>  				}
>
> -				stats.nr_failed_pages += nr_pages + nr_retry_pages;
> +				stats->nr_failed_pages += nr_pages + nr_retry_pages;
>  				/*
>  				 * There might be some split folios of fail-to-migrate large
> -				 * folios left in split_folios list. Move them back to migration
> +				 * folios left in split_folios list. Move them to ret_folios
>  				 * list so that they could be put back to the right list by
>  				 * the caller otherwise the folio refcnt will be leaked.
>  				 */
> -				list_splice_init(&split_folios, from);
> +				list_splice_init(&split_folios, ret_folios);
>  				/* nr_failed isn't updated for not used */
>  				nr_large_failed += large_retry;
> -				stats.nr_thp_failed += thp_retry;
> +				stats->nr_thp_failed += thp_retry;
>  				goto out;
>  			case -EAGAIN:
>  				if (is_large) {
> @@ -1657,8 +1615,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  				nr_retry_pages += nr_pages;
>  				break;
>  			case MIGRATEPAGE_SUCCESS:
> -				stats.nr_succeeded += nr_pages;
> -				stats.nr_thp_succeeded += is_thp;
> +				stats->nr_succeeded += nr_pages;
> +				stats->nr_thp_succeeded += is_thp;
>  				break;
>  			default:
>  				/*
> @@ -1669,20 +1627,20 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  				 */
>  				if (is_large) {
>  					nr_large_failed++;
> -					stats.nr_thp_failed += is_thp;
> +					stats->nr_thp_failed += is_thp;
>  				} else if (!no_split_folio_counting) {
>  					nr_failed++;
>  				}
>
> -				stats.nr_failed_pages += nr_pages;
> +				stats->nr_failed_pages += nr_pages;
>  				break;
>  			}
>  		}
>  	}
>  	nr_failed += retry;
>  	nr_large_failed += large_retry;
> -	stats.nr_thp_failed += thp_retry;
> -	stats.nr_failed_pages += nr_retry_pages;
> +	stats->nr_thp_failed += thp_retry;
> +	stats->nr_failed_pages += nr_retry_pages;
>  	/*
>  	 * Try to migrate split folios of fail-to-migrate large folios, no
>  	 * nr_failed counting in this round, since all split folios of a
> @@ -1693,7 +1651,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  		 * Move non-migrated folios (after 10 retries) to ret_folios
>  		 * to avoid migrating them again.
>  		 */
> -		list_splice_init(from, &ret_folios);
> +		list_splice_init(from, ret_folios);
>  		list_splice_init(&split_folios, from);
>  		no_split_folio_counting = true;
>  		retry = 1;
> @@ -1701,6 +1659,87 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  	}
>
>  	rc = nr_failed + nr_large_failed;
> +out:
> +	return rc;
> +}
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#define NR_MAX_BATCHED_MIGRATION	HPAGE_PMD_NR
> +#else
> +#define NR_MAX_BATCHED_MIGRATION	512
> +#endif
> +
> +/*
> + * migrate_pages - migrate the folios specified in a list, to the free folios
> + *		   supplied as the target for the page migration
> + *
> + * @from:		The list of folios to be migrated.
> + * @get_new_page:	The function used to allocate free folios to be used
> + *			as the target of the folio migration.
> + * @put_new_page:	The function used to free target folios if migration
> + *			fails, or NULL if no special handling is necessary.
> + * @private:		Private data to be passed on to get_new_page()
> + * @mode:		The migration mode that specifies the constraints for
> + *			folio migration, if any.
> + * @reason:		The reason for folio migration.
> + * @ret_succeeded:	Set to the number of folios migrated successfully if
> + *			the caller passes a non-NULL pointer.
> + *
> + * The function returns after 10 attempts or if no folios are movable any more
> + * because the list has become empty or no retryable folios exist any more.
> + * It is caller's responsibility to call putback_movable_pages() to return folios
> + * to the LRU or free list only if ret != 0.
> + *
> + * Returns the number of {normal folio, large folio, hugetlb} that were not
> + * migrated, or an error code. The number of large folio splits will be
> + * considered as the number of non-migrated large folio, no matter how many
> + * split folios of the large folio are migrated successfully.
> + */
> +int migrate_pages(struct list_head *from, new_page_t get_new_page,
> +		free_page_t put_new_page, unsigned long private,
> +		enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
> +{
> +	int rc, rc_gether;

rc_gether -> rc_gather?

> +	int nr_pages;
> +	struct folio *folio, *folio2;
> +	LIST_HEAD(folios);
> +	LIST_HEAD(ret_folios);
> +	struct migrate_pages_stats stats;
> +
> +	trace_mm_migrate_pages_start(mode, reason);
> +
> +	memset(&stats, 0, sizeof(stats));
> +
> +	rc_gether = migrate_hugetlbs(from, get_new_page, put_new_page, private,
> +				     mode, reason, &stats, &ret_folios);
> +	if (rc_gether < 0)
> +		goto out;
> +again:
> +	nr_pages = 0;
> +	list_for_each_entry_safe(folio, folio2, from, lru) {
> +		if (folio_test_hugetlb(folio)) {
> +			list_move_tail(&folio->lru, &ret_folios);
> +			continue;
> +		}
> +
> +		nr_pages += folio_nr_pages(folio);
> +		if (nr_pages > NR_MAX_BATCHED_MIGRATION)
> +			break;
> +	}
> +	if (nr_pages > NR_MAX_BATCHED_MIGRATION)
> +		list_cut_before(&folios, from, &folio->lru);
> +	else
> +		list_splice_init(from, &folios);
> +	rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private,
> +				 mode, reason, &ret_folios, &stats);
> +	list_splice_tail_init(&folios, &ret_folios);
> +	if (rc < 0) {
> +		rc_gether = rc;
> +		goto out;
> +	}
> +	rc_gether += rc;
> +	if (!list_empty(from))
> +		goto again;
>  out:
>  	/*
>  	 * Put the permanent failure folio back to migration list, they
> @@ -1713,7 +1752,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  	 * are migrated successfully.
>  	 */
>  	if (list_empty(from))
> -		rc = 0;
> +		rc_gether = 0;
>
>  	count_vm_events(PGMIGRATE_SUCCESS, stats.nr_succeeded);
>  	count_vm_events(PGMIGRATE_FAIL, stats.nr_failed_pages);
> @@ -1727,7 +1766,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  	if (ret_succeeded)
>  		*ret_succeeded = stats.nr_succeeded;
>
> -	return rc;
> +	return rc_gether;
>  }
>
>  struct page *alloc_migration_target(struct page *page, unsigned long private)
> -- 
> 2.35.1


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/8] migrate_pages: split unmap_and_move() to _unmap() and _move()
  2022-12-27  0:28 ` [PATCH 4/8] migrate_pages: split unmap_and_move() to _unmap() and _move() Huang Ying
@ 2023-01-03 18:55   ` Zi Yan
  2023-01-05 18:26   ` Nathan Chancellor
  1 sibling, 0 replies; 39+ messages in thread
From: Zi Yan @ 2023-01-03 18:55 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Baolin Wang,
	Oscar Salvador, Matthew Wilcox, Bharata B Rao, Alistair Popple,
	haoxin

[-- Attachment #1: Type: text/plain, Size: 9820 bytes --]

On 26 Dec 2022, at 19:28, Huang Ying wrote:

> This is a preparation patch to batch the folio unmapping and moving.
>
> In this patch, unmap_and_move() is split to migrate_folio_unmap() and
> migrate_folio_move().  So, we can batch _unmap() and _move() in
> different loops later.  To pass some information between unmap and
> move, the original unused dst->mapping and dst->private are used.
>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Bharata B Rao <bharata@amd.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: haoxin <xhao@linux.alibaba.com>
> ---
>  include/linux/migrate.h |   1 +
>  mm/migrate.c            | 162 +++++++++++++++++++++++++++++-----------
>  2 files changed, 121 insertions(+), 42 deletions(-)
>
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 3ef77f52a4f0..7376074f2e1e 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -18,6 +18,7 @@ struct migration_target_control;
>   * - zero on page migration success;
>   */
>  #define MIGRATEPAGE_SUCCESS		0
> +#define MIGRATEPAGE_UNMAP		1
>
>  /**
>   * struct movable_operations - Driver page migration
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 97ea0737ab2b..e2383b430932 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1009,11 +1009,29 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
>  	return rc;
>  }
>
> -static int __unmap_and_move(struct folio *src, struct folio *dst,
> +static void __migrate_folio_record(struct folio *dst,
> +				   unsigned long page_was_mapped,
> +				   struct anon_vma *anon_vma)
> +{
> +	dst->mapping = (struct address_space *)anon_vma;
> +	dst->private = (void *)page_was_mapped;
> +}
> +
> +static void __migrate_folio_extract(struct folio *dst,
> +				   int *page_was_mappedp,
> +				   struct anon_vma **anon_vmap)
> +{
> +	*anon_vmap = (struct anon_vma *)dst->mapping;
> +	*page_was_mappedp = (unsigned long)dst->private;
> +	dst->mapping = NULL;
> +	dst->private = NULL;
> +}
> +

We probably need comments on these two functions on using dst->mapping
and dst->private. It might be hard to get the behavior later by
digging into the git log and/or looking at the code.

> +static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>  				int force, enum migrate_mode mode)
>  {
>  	int rc = -EAGAIN;
> -	bool page_was_mapped = false;
> +	int page_was_mapped = 0;
>  	struct anon_vma *anon_vma = NULL;
>  	bool is_lru = !__PageMovable(&src->page);
>
> @@ -1089,8 +1107,8 @@ static int __unmap_and_move(struct folio *src, struct folio *dst,
>  		goto out_unlock;
>
>  	if (unlikely(!is_lru)) {
> -		rc = move_to_new_folio(dst, src, mode);
> -		goto out_unlock_both;
> +		__migrate_folio_record(dst, page_was_mapped, anon_vma);
> +		return MIGRATEPAGE_UNMAP;
>  	}
>
>  	/*
> @@ -1115,11 +1133,40 @@ static int __unmap_and_move(struct folio *src, struct folio *dst,
>  		VM_BUG_ON_FOLIO(folio_test_anon(src) &&
>  			       !folio_test_ksm(src) && !anon_vma, src);
>  		try_to_migrate(src, 0);
> -		page_was_mapped = true;
> +		page_was_mapped = 1;
>  	}
>
> -	if (!folio_mapped(src))
> -		rc = move_to_new_folio(dst, src, mode);
> +	if (!folio_mapped(src)) {
> +		__migrate_folio_record(dst, page_was_mapped, anon_vma);
> +		return MIGRATEPAGE_UNMAP;
> +	}
> +
> +
> +	if (page_was_mapped)
> +		remove_migration_ptes(src, src, false);
> +
> +out_unlock_both:
> +	folio_unlock(dst);
> +out_unlock:
> +	/* Drop an anon_vma reference if we took one */
> +	if (anon_vma)
> +		put_anon_vma(anon_vma);
> +	folio_unlock(src);
> +out:
> +
> +	return rc;
> +}
> +
> +static int __migrate_folio_move(struct folio *src, struct folio *dst,
> +				enum migrate_mode mode)
> +{
> +	int rc;
> +	int page_was_mapped = 0;
> +	struct anon_vma *anon_vma = NULL;
> +
> +	__migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
> +
> +	rc = move_to_new_folio(dst, src, mode);
>
>  	/*
>  	 * When successful, push dst to LRU immediately: so that if it
> @@ -1140,14 +1187,11 @@ static int __unmap_and_move(struct folio *src, struct folio *dst,
>  		remove_migration_ptes(src,
>  			rc == MIGRATEPAGE_SUCCESS ? dst : src, false);
>
> -out_unlock_both:
>  	folio_unlock(dst);
> -out_unlock:
>  	/* Drop an anon_vma reference if we took one */
>  	if (anon_vma)
>  		put_anon_vma(anon_vma);
>  	folio_unlock(src);
> -out:
>  	/*
>  	 * If migration is successful, decrease refcount of dst,
>  	 * which will not free the page because new page owner increased
> @@ -1159,19 +1203,32 @@ static int __unmap_and_move(struct folio *src, struct folio *dst,
>  	return rc;
>  }
>
> -/*
> - * Obtain the lock on folio, remove all ptes and migrate the folio
> - * to the newly allocated folio in dst.
> - */
> -static int unmap_and_move(new_page_t get_new_page,
> -				   free_page_t put_new_page,
> -				   unsigned long private, struct folio *src,
> -				   int force, enum migrate_mode mode,
> -				   enum migrate_reason reason,
> -				   struct list_head *ret)
> +static void migrate_folio_done(struct folio *src,
> +			       enum migrate_reason reason)
> +{
> +	/*
> +	 * Compaction can migrate also non-LRU pages which are
> +	 * not accounted to NR_ISOLATED_*. They can be recognized
> +	 * as __PageMovable
> +	 */
> +	if (likely(!__folio_test_movable(src)))
> +		mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON +
> +				    folio_is_file_lru(src), -folio_nr_pages(src));
> +
> +	if (reason != MR_MEMORY_FAILURE)
> +		/* We release the page in page_handle_poison. */
> +		folio_put(src);
> +}
> +
> +/* Obtain the lock on page, remove all ptes. */
> +static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
> +			       unsigned long private, struct folio *src,
> +			       struct folio **dstp, int force,
> +			       enum migrate_mode mode, enum migrate_reason reason,
> +			       struct list_head *ret)
>  {
>  	struct folio *dst;
> -	int rc = MIGRATEPAGE_SUCCESS;
> +	int rc = MIGRATEPAGE_UNMAP;
>  	struct page *newpage = NULL;
>
>  	if (!thp_migration_supported() && folio_test_transhuge(src))
> @@ -1182,20 +1239,50 @@ static int unmap_and_move(new_page_t get_new_page,
>  		folio_clear_active(src);
>  		folio_clear_unevictable(src);
>  		/* free_pages_prepare() will clear PG_isolated. */
> -		goto out;
> +		list_del(&src->lru);
> +		migrate_folio_done(src, reason);
> +		return MIGRATEPAGE_SUCCESS;
>  	}
>
>  	newpage = get_new_page(&src->page, private);
>  	if (!newpage)
>  		return -ENOMEM;
>  	dst = page_folio(newpage);
> +	*dstp = dst;
>
>  	dst->private = NULL;
> -	rc = __unmap_and_move(src, dst, force, mode);
> +	rc = __migrate_folio_unmap(src, dst, force, mode);
> +	if (rc == MIGRATEPAGE_UNMAP)
> +		return rc;
> +
> +	/*
> +	 * A page that has not been migrated will have kept its
> +	 * references and be restored.
> +	 */
> +	/* restore the folio to right list. */
> +	if (rc != -EAGAIN)
> +		list_move_tail(&src->lru, ret);
> +
> +	if (put_new_page)
> +		put_new_page(&dst->page, private);
> +	else
> +		folio_put(dst);
> +
> +	return rc;
> +}
> +
> +/* Migrate the folio to the newly allocated folio in dst. */
> +static int migrate_folio_move(free_page_t put_new_page, unsigned long private,
> +			      struct folio *src, struct folio *dst,
> +			      enum migrate_mode mode, enum migrate_reason reason,
> +			      struct list_head *ret)
> +{
> +	int rc;
> +
> +	rc = __migrate_folio_move(src, dst, mode);
>  	if (rc == MIGRATEPAGE_SUCCESS)
>  		set_page_owner_migrate_reason(&dst->page, reason);
>
> -out:
>  	if (rc != -EAGAIN) {
>  		/*
>  		 * A folio that has been migrated has all references
> @@ -1211,20 +1298,7 @@ static int unmap_and_move(new_page_t get_new_page,
>  	 * we want to retry.
>  	 */
>  	if (rc == MIGRATEPAGE_SUCCESS) {
> -		/*
> -		 * Compaction can migrate also non-LRU folios which are
> -		 * not accounted to NR_ISOLATED_*. They can be recognized
> -		 * as __folio_test_movable
> -		 */
> -		if (likely(!__folio_test_movable(src)))
> -			mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON +
> -					folio_is_file_lru(src), -folio_nr_pages(src));
> -
> -		if (reason != MR_MEMORY_FAILURE)
> -			/*
> -			 * We release the folio in page_handle_poison.
> -			 */
> -			folio_put(src);
> +		migrate_folio_done(src, reason);
>  	} else {
>  		if (rc != -EAGAIN)
>  			list_add_tail(&src->lru, ret);
> @@ -1499,7 +1573,7 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>  	int pass = 0;
>  	bool is_large = false;
>  	bool is_thp = false;
> -	struct folio *folio, *folio2;
> +	struct folio *folio, *folio2, *dst = NULL;
>  	int rc, nr_pages;
>  	LIST_HEAD(split_folios);
>  	bool nosplit = (reason == MR_NUMA_MISPLACED);
> @@ -1524,9 +1598,13 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>
>  			cond_resched();
>
> -			rc = unmap_and_move(get_new_page, put_new_page,
> -					    private, folio, pass > 2, mode,
> -					    reason, ret_folios);
> +			rc = migrate_folio_unmap(get_new_page, put_new_page, private,
> +						 folio, &dst, pass > 2, mode,
> +						 reason, ret_folios);
> +			if (rc == MIGRATEPAGE_UNMAP)
> +				rc = migrate_folio_move(put_new_page, private,
> +							folio, dst, mode,
> +							reason, ret_folios);
>  			/*
>  			 * The rules are:
>  			 *	Success: folio will be freed
> -- 
> 2.35.1


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/8] migrate_pages: batch _unmap and _move
  2022-12-27  0:28 ` [PATCH 5/8] migrate_pages: batch _unmap and _move Huang Ying
  2022-12-28 23:22   ` Andrew Morton
@ 2023-01-03 19:01   ` Zi Yan
  2023-01-04  0:34     ` Huang, Ying
  1 sibling, 1 reply; 39+ messages in thread
From: Zi Yan @ 2023-01-03 19:01 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Baolin Wang,
	Oscar Salvador, Matthew Wilcox, Bharata B Rao, Alistair Popple,
	haoxin

[-- Attachment #1: Type: text/plain, Size: 12080 bytes --]

On 26 Dec 2022, at 19:28, Huang Ying wrote:

> In this patch the _unmap and _move stage of the folio migration is
> batched.  That for, previously, it is,
>
>   for each folio
>     _unmap()
>     _move()
>
> Now, it is,
>
>   for each folio
>     _unmap()
>   for each folio
>     _move()

Also worth adding some notes here, we need extra code to undo the _unmap()
if _move() fails. Andrew has asked for comments on *_undo_src/dst(),
but I think it would be better to provide a high level new workflow,
in the form of pseudo code, in git log and the comment for migrate_pages().
The extra cleanup code for a failed _move() with a previously successful
_unmap() might not be obvious to everyone.

>
> Based on this, we can batch the TLB flushing and use some hardware
> accelerator to copy folios between batched _unmap and batched _move
> stages.
>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Bharata B Rao <bharata@amd.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: haoxin <xhao@linux.alibaba.com>
> ---
>  mm/migrate.c | 189 ++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 165 insertions(+), 24 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e2383b430932..dd68c3de3da8 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1027,8 +1027,32 @@ static void __migrate_folio_extract(struct folio *dst,
>  	dst->private = NULL;
>  }
>
> +static void migrate_folio_undo_src(struct folio *src,
> +				   int page_was_mapped,
> +				   struct anon_vma *anon_vma,
> +				   struct list_head *ret)
> +{
> +	if (page_was_mapped)
> +		remove_migration_ptes(src, src, false);
> +	if (anon_vma)
> +		put_anon_vma(anon_vma);
> +	folio_unlock(src);
> +	list_move_tail(&src->lru, ret);
> +}
> +
> +static void migrate_folio_undo_dst(struct folio *dst,
> +				   free_page_t put_new_page,
> +				   unsigned long private)
> +{
> +	folio_unlock(dst);
> +	if (put_new_page)
> +		put_new_page(&dst->page, private);
> +	else
> +		folio_put(dst);
> +}
> +
>  static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
> -				int force, enum migrate_mode mode)
> +				 int force, bool force_lock, enum migrate_mode mode)
>  {
>  	int rc = -EAGAIN;
>  	int page_was_mapped = 0;
> @@ -1055,6 +1079,11 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>  		if (current->flags & PF_MEMALLOC)
>  			goto out;
>
> +		if (!force_lock) {
> +			rc = -EDEADLOCK;
> +			goto out;
> +		}
> +
>  		folio_lock(src);
>  	}
>
> @@ -1168,6 +1197,8 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst,
>
>  	rc = move_to_new_folio(dst, src, mode);
>
> +	if (rc != -EAGAIN)
> +		list_del(&dst->lru);
>  	/*
>  	 * When successful, push dst to LRU immediately: so that if it
>  	 * turns out to be an mlocked page, remove_migration_ptes() will
> @@ -1183,6 +1214,11 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst,
>  			lru_add_drain();
>  	}
>
> +	if (rc == -EAGAIN) {
> +		__migrate_folio_record(dst, page_was_mapped, anon_vma);
> +		return rc;
> +	}
> +
>  	if (page_was_mapped)
>  		remove_migration_ptes(src,
>  			rc == MIGRATEPAGE_SUCCESS ? dst : src, false);
> @@ -1223,7 +1259,7 @@ static void migrate_folio_done(struct folio *src,
>  /* Obtain the lock on page, remove all ptes. */
>  static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
>  			       unsigned long private, struct folio *src,
> -			       struct folio **dstp, int force,
> +			       struct folio **dstp, int force, bool force_lock,
>  			       enum migrate_mode mode, enum migrate_reason reason,
>  			       struct list_head *ret)
>  {
> @@ -1251,7 +1287,7 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
>  	*dstp = dst;
>
>  	dst->private = NULL;
> -	rc = __migrate_folio_unmap(src, dst, force, mode);
> +	rc = __migrate_folio_unmap(src, dst, force, force_lock, mode);
>  	if (rc == MIGRATEPAGE_UNMAP)
>  		return rc;
>
> @@ -1260,7 +1296,7 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
>  	 * references and be restored.
>  	 */
>  	/* restore the folio to right list. */
> -	if (rc != -EAGAIN)
> +	if (rc != -EAGAIN && rc != -EDEADLOCK)
>  		list_move_tail(&src->lru, ret);
>
>  	if (put_new_page)
> @@ -1299,9 +1335,8 @@ static int migrate_folio_move(free_page_t put_new_page, unsigned long private,
>  	 */
>  	if (rc == MIGRATEPAGE_SUCCESS) {
>  		migrate_folio_done(src, reason);
> -	} else {
> -		if (rc != -EAGAIN)
> -			list_add_tail(&src->lru, ret);
> +	} else if (rc != -EAGAIN) {
> +		list_add_tail(&src->lru, ret);
>
>  		if (put_new_page)
>  			put_new_page(&dst->page, private);
> @@ -1564,7 +1599,7 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>  		enum migrate_mode mode, int reason, struct list_head *ret_folios,
>  		struct migrate_pages_stats *stats)
>  {
> -	int retry = 1;
> +	int retry;
>  	int large_retry = 1;
>  	int thp_retry = 1;
>  	int nr_failed = 0;
> @@ -1573,13 +1608,19 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>  	int pass = 0;
>  	bool is_large = false;
>  	bool is_thp = false;
> -	struct folio *folio, *folio2, *dst = NULL;
> -	int rc, nr_pages;
> +	struct folio *folio, *folio2, *dst = NULL, *dst2;
> +	int rc, rc_saved, nr_pages;
>  	LIST_HEAD(split_folios);
> +	LIST_HEAD(unmap_folios);
> +	LIST_HEAD(dst_folios);
>  	bool nosplit = (reason == MR_NUMA_MISPLACED);
>  	bool no_split_folio_counting = false;
> +	bool force_lock;
>
> -split_folio_migration:
> +retry:
> +	rc_saved = 0;
> +	force_lock = true;
> +	retry = 1;
>  	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
>  		retry = 0;
>  		large_retry = 0;
> @@ -1599,16 +1640,15 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>  			cond_resched();
>
>  			rc = migrate_folio_unmap(get_new_page, put_new_page, private,
> -						 folio, &dst, pass > 2, mode,
> -						 reason, ret_folios);
> -			if (rc == MIGRATEPAGE_UNMAP)
> -				rc = migrate_folio_move(put_new_page, private,
> -							folio, dst, mode,
> -							reason, ret_folios);
> +						 folio, &dst, pass > 2, force_lock,
> +						 mode, reason, ret_folios);
>  			/*
>  			 * The rules are:
>  			 *	Success: folio will be freed
> +			 *	Unmap: folio will be put on unmap_folios list,
> +			 *	       dst folio put on dst_folios list
>  			 *	-EAGAIN: stay on the from list
> +			 *	-EDEADLOCK: stay on the from list
>  			 *	-ENOMEM: stay on the from list
>  			 *	-ENOSYS: stay on the from list
>  			 *	Other errno: put on ret_folios list
> @@ -1643,7 +1683,7 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>  			case -ENOMEM:
>  				/*
>  				 * When memory is low, don't bother to try to migrate
> -				 * other folios, just exit.
> +				 * other folios, move unmapped folios, then exit.
>  				 */
>  				if (is_large) {
>  					nr_large_failed++;
> @@ -1682,7 +1722,14 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>  				/* nr_failed isn't updated for not used */
>  				nr_large_failed += large_retry;
>  				stats->nr_thp_failed += thp_retry;
> -				goto out;
> +				rc_saved = rc;
> +				if (list_empty(&unmap_folios))
> +					goto out;
> +				else
> +					goto move;
> +			case -EDEADLOCK:
> +				rc_saved = rc;
> +				goto move;
>  			case -EAGAIN:
>  				if (is_large) {
>  					large_retry++;
> @@ -1696,6 +1743,15 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>  				stats->nr_succeeded += nr_pages;
>  				stats->nr_thp_succeeded += is_thp;
>  				break;
> +			case MIGRATEPAGE_UNMAP:
> +				/*
> +				 * We have locked some pages, don't force lock
> +				 * to avoid deadlock.
> +				 */
> +				force_lock = false;
> +				list_move_tail(&folio->lru, &unmap_folios);
> +				list_add_tail(&dst->lru, &dst_folios);
> +				break;
>  			default:
>  				/*
>  				 * Permanent failure (-EBUSY, etc.):
> @@ -1719,12 +1775,93 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>  	nr_large_failed += large_retry;
>  	stats->nr_thp_failed += thp_retry;
>  	stats->nr_failed_pages += nr_retry_pages;
> +move:
> +	retry = 1;
> +	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
> +		retry = 0;
> +		large_retry = 0;
> +		thp_retry = 0;
> +		nr_retry_pages = 0;
> +
> +		dst = list_first_entry(&dst_folios, struct folio, lru);
> +		dst2 = list_next_entry(dst, lru);
> +		list_for_each_entry_safe(folio, folio2, &unmap_folios, lru) {
> +			is_large = folio_test_large(folio);
> +			is_thp = is_large && folio_test_pmd_mappable(folio);
> +			nr_pages = folio_nr_pages(folio);
> +
> +			cond_resched();
> +
> +			rc = migrate_folio_move(put_new_page, private,
> +						folio, dst, mode,
> +						reason, ret_folios);
> +			/*
> +			 * The rules are:
> +			 *	Success: folio will be freed
> +			 *	-EAGAIN: stay on the unmap_folios list
> +			 *	Other errno: put on ret_folios list
> +			 */
> +			switch(rc) {
> +			case -EAGAIN:
> +				if (is_large) {
> +					large_retry++;
> +					thp_retry += is_thp;
> +				} else if (!no_split_folio_counting) {
> +					retry++;
> +				}
> +				nr_retry_pages += nr_pages;
> +				break;
> +			case MIGRATEPAGE_SUCCESS:
> +				stats->nr_succeeded += nr_pages;
> +				stats->nr_thp_succeeded += is_thp;
> +				break;
> +			default:
> +				if (is_large) {
> +					nr_large_failed++;
> +					stats->nr_thp_failed += is_thp;
> +				} else if (!no_split_folio_counting) {
> +					nr_failed++;
> +				}
> +
> +				stats->nr_failed_pages += nr_pages;
> +				break;
> +			}
> +			dst = dst2;
> +			dst2 = list_next_entry(dst, lru);
> +		}
> +	}
> +	nr_failed += retry;
> +	nr_large_failed += large_retry;
> +	stats->nr_thp_failed += thp_retry;
> +	stats->nr_failed_pages += nr_retry_pages;
> +
> +	if (rc_saved)
> +		rc = rc_saved;
> +	else
> +		rc = nr_failed + nr_large_failed;
> +out:
> +	/* Cleanup remaining folios */
> +	dst = list_first_entry(&dst_folios, struct folio, lru);
> +	dst2 = list_next_entry(dst, lru);
> +	list_for_each_entry_safe(folio, folio2, &unmap_folios, lru) {
> +		int page_was_mapped = 0;
> +		struct anon_vma *anon_vma = NULL;
> +
> +		__migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
> +		migrate_folio_undo_src(folio, page_was_mapped, anon_vma,
> +				       ret_folios);
> +		list_del(&dst->lru);
> +		migrate_folio_undo_dst(dst, put_new_page, private);
> +		dst = dst2;
> +		dst2 = list_next_entry(dst, lru);
> +	}
> +
>  	/*
>  	 * Try to migrate split folios of fail-to-migrate large folios, no
>  	 * nr_failed counting in this round, since all split folios of a
>  	 * large folio is counted as 1 failure in the first round.
>  	 */
> -	if (!list_empty(&split_folios)) {
> +	if (rc >= 0 && !list_empty(&split_folios)) {
>  		/*
>  		 * Move non-migrated folios (after 10 retries) to ret_folios
>  		 * to avoid migrating them again.
> @@ -1732,12 +1869,16 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>  		list_splice_init(from, ret_folios);
>  		list_splice_init(&split_folios, from);
>  		no_split_folio_counting = true;
> -		retry = 1;
> -		goto split_folio_migration;
> +		goto retry;
>  	}
>
> -	rc = nr_failed + nr_large_failed;
> -out:
> +	/*
> +	 * We have unlocked all locked pages, so we can force lock now, let's
> +	 * try again.
> +	 */
> +	if (rc == -EDEADLOCK)
> +		goto retry;
> +
>  	return rc;
>  }
>
> -- 
> 2.35.1


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/8] migrate_pages: move migrate_folio_done() and migrate_folio_unmap()
  2022-12-27  0:28 ` [PATCH 6/8] migrate_pages: move migrate_folio_done() and migrate_folio_unmap() Huang Ying
@ 2023-01-03 19:02   ` Zi Yan
  2023-01-04  1:26     ` Huang, Ying
  0 siblings, 1 reply; 39+ messages in thread
From: Zi Yan @ 2023-01-03 19:02 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Baolin Wang,
	Oscar Salvador, Matthew Wilcox, Bharata B Rao, Alistair Popple,
	haoxin

[-- Attachment #1: Type: text/plain, Size: 309 bytes --]

On 26 Dec 2022, at 19:28, Huang Ying wrote:

> Just move the position of 2 functions.  There's no any functionality
> change.  This is to make it easier to review the next patch via
> putting code near its position in the next patch.

This probably can be merged into prior patches.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 8/8] migrate_pages: batch flushing TLB
  2022-12-27  0:28 ` [PATCH 8/8] migrate_pages: batch flushing TLB Huang Ying
@ 2023-01-03 19:19   ` Zi Yan
  2023-01-04  1:41     ` Huang, Ying
  0 siblings, 1 reply; 39+ messages in thread
From: Zi Yan @ 2023-01-03 19:19 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Baolin Wang,
	Oscar Salvador, Matthew Wilcox, Bharata B Rao, Alistair Popple,
	haoxin

[-- Attachment #1: Type: text/plain, Size: 4490 bytes --]

On 26 Dec 2022, at 19:28, Huang Ying wrote:

> The TLB flushing will cost quite some CPU cycles during the folio
> migration in some situations.  For example, when migrate a folio of a
> process with multiple active threads that run on multiple CPUs.  After
> batching the _unmap and _move in migrate_pages(), the TLB flushing can
> be batched easily with the existing TLB flush batching mechanism.
> This patch implements that.
>
> We use the following test case to test the patch.
>
> On a 2-socket Intel server,
>
> - Run pmbench memory accessing benchmark
>
> - Run `migratepages` to migrate pages of pmbench between node 0 and
>   node 1 back and forth.
>
> With the patch, the TLB flushing IPI reduces 99.1% during the test and
> the number of pages migrated successfully per second increases 291.7%.
>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Bharata B Rao <bharata@amd.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: haoxin <xhao@linux.alibaba.com>
> ---
>  mm/migrate.c |  4 +++-
>  mm/rmap.c    | 20 +++++++++++++++++---
>  2 files changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 70a40b8fee1f..d7413164e748 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1215,7 +1215,7 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
>  		/* Establish migration ptes */
>  		VM_BUG_ON_FOLIO(folio_test_anon(src) &&
>  			       !folio_test_ksm(src) && !anon_vma, src);
> -		try_to_migrate(src, 0);
> +		try_to_migrate(src, TTU_BATCH_FLUSH);
>  		page_was_mapped = 1;
>  	}
>
> @@ -1732,6 +1732,8 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>  	stats->nr_thp_failed += thp_retry;
>  	stats->nr_failed_pages += nr_retry_pages;
>  move:
> +	try_to_unmap_flush();
> +
>  	retry = 1;
>  	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
>  		retry = 0;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index b616870a09be..2e125f3e462e 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1976,7 +1976,21 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>  		} else {
>  			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>  			/* Nuke the page table entry. */
> -			pteval = ptep_clear_flush(vma, address, pvmw.pte);
> +			if (should_defer_flush(mm, flags)) {
> +				/*
> +				 * We clear the PTE but do not flush so potentially
> +				 * a remote CPU could still be writing to the folio.
> +				 * If the entry was previously clean then the
> +				 * architecture must guarantee that a clear->dirty
> +				 * transition on a cached TLB entry is written through
> +				 * and traps if the PTE is unmapped.
> +				 */
> +				pteval = ptep_get_and_clear(mm, address, pvmw.pte);
> +
> +				set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
> +			} else {
> +				pteval = ptep_clear_flush(vma, address, pvmw.pte);
> +			}
>  		}
>

This is only for PTE mapped pages, right? We also need something similar
in set_pmd_migration_entry() in mm/huge_memory.c for PMD-mapped THPs.
Oh, since you limit NR_MAX_BATCHED_MIGRATION to HPAGE_PMD_NR and count
nr_pages with folio_nr_pages(), THPs will only be migrated one by one.
This is not obvious from the cover letter.

Are you planning to support batched THP migration? If not, it might be
better to update cover letter to be explicit about it and add comments
in migrate_pages(). It would be nice to also note that we need to
increase NR_MAX_BATCHED_MIGRATION beyond HPAGE_PMD_NR and make similar
changes in set_pmd_migration_entry() to get batched THP migration support.

>  		/* Set the dirty flag on the folio now the pte is gone. */
> @@ -2148,10 +2162,10 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags)
>
>  	/*
>  	 * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and
> -	 * TTU_SPLIT_HUGE_PMD and TTU_SYNC flags.
> +	 * TTU_SPLIT_HUGE_PMD, TTU_SYNC, and TTU_BATCH_FLUSH flags.
>  	 */
>  	if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD |
> -					TTU_SYNC)))
> +					TTU_SYNC | TTU_BATCH_FLUSH)))
>  		return;
>
>  	if (folio_is_zone_device(folio) &&
> -- 
> 2.35.1


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/8] migrate_pages: restrict number of pages to migrate in batch
  2023-01-03 18:40   ` Zi Yan
@ 2023-01-04  0:24     ` Huang, Ying
  0 siblings, 0 replies; 39+ messages in thread
From: Huang, Ying @ 2023-01-04  0:24 UTC (permalink / raw)
  To: Zi Yan
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Baolin Wang,
	Oscar Salvador, Matthew Wilcox, Bharata B Rao, Alistair Popple,
	haoxin

Zi Yan <ziy@nvidia.com> writes:

> On 26 Dec 2022, at 19:28, Huang Ying wrote:

[snip]

>> +/*
>> + * migrate_pages - migrate the folios specified in a list, to the free folios
>> + *		   supplied as the target for the page migration
>> + *
>> + * @from:		The list of folios to be migrated.
>> + * @get_new_page:	The function used to allocate free folios to be used
>> + *			as the target of the folio migration.
>> + * @put_new_page:	The function used to free target folios if migration
>> + *			fails, or NULL if no special handling is necessary.
>> + * @private:		Private data to be passed on to get_new_page()
>> + * @mode:		The migration mode that specifies the constraints for
>> + *			folio migration, if any.
>> + * @reason:		The reason for folio migration.
>> + * @ret_succeeded:	Set to the number of folios migrated successfully if
>> + *			the caller passes a non-NULL pointer.
>> + *
>> + * The function returns after 10 attempts or if no folios are movable any more
>> + * because the list has become empty or no retryable folios exist any more.
>> + * It is caller's responsibility to call putback_movable_pages() to return folios
>> + * to the LRU or free list only if ret != 0.
>> + *
>> + * Returns the number of {normal folio, large folio, hugetlb} that were not
>> + * migrated, or an error code. The number of large folio splits will be
>> + * considered as the number of non-migrated large folio, no matter how many
>> + * split folios of the large folio are migrated successfully.
>> + */
>> +int migrate_pages(struct list_head *from, new_page_t get_new_page,
>> +		free_page_t put_new_page, unsigned long private,
>> +		enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
>> +{
>> +	int rc, rc_gether;
>
> rc_gether -> rc_gather?

Good catch!  Thanks!  Will change this in the next version.

Best Regards,
Huang, Ying

>> +	int nr_pages;
>> +	struct folio *folio, *folio2;
>> +	LIST_HEAD(folios);
>> +	LIST_HEAD(ret_folios);
>> +	struct migrate_pages_stats stats;
>> +

[snip]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/8] migrate_pages: batch _unmap and _move
  2023-01-03 19:01   ` Zi Yan
@ 2023-01-04  0:34     ` Huang, Ying
  0 siblings, 0 replies; 39+ messages in thread
From: Huang, Ying @ 2023-01-04  0:34 UTC (permalink / raw)
  To: Zi Yan
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Baolin Wang,
	Oscar Salvador, Matthew Wilcox, Bharata B Rao, Alistair Popple,
	haoxin

Zi Yan <ziy@nvidia.com> writes:

> On 26 Dec 2022, at 19:28, Huang Ying wrote:
>
>> In this patch the _unmap and _move stage of the folio migration is
>> batched.  That for, previously, it is,
>>
>>   for each folio
>>     _unmap()
>>     _move()
>>
>> Now, it is,
>>
>>   for each folio
>>     _unmap()
>>   for each folio
>>     _move()
>
> Also worth adding some notes here, we need extra code to undo the _unmap()
> if _move() fails. Andrew has asked for comments on *_undo_src/dst(),
> but I think it would be better to provide a high level new workflow,
> in the form of pseudo code, in git log and the comment for migrate_pages().
> The extra cleanup code for a failed _move() with a previously successful
> _unmap() might not be obvious to everyone.

Here, I removed the detailed error processing to make it easier to
understand the basic flow changing.  So, I would rather to keep the
pseudo code here as simple as possible and keep the detailed flow in the
diff below.

Best Regards,
Huang, Ying

>>
>> Based on this, we can batch the TLB flushing and use some hardware
>> accelerator to copy folios between batched _unmap and batched _move
>> stages.
>>

[snip]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/8] migrate_pages: move migrate_folio_done() and migrate_folio_unmap()
  2023-01-03 19:02   ` Zi Yan
@ 2023-01-04  1:26     ` Huang, Ying
  0 siblings, 0 replies; 39+ messages in thread
From: Huang, Ying @ 2023-01-04  1:26 UTC (permalink / raw)
  To: Zi Yan
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Baolin Wang,
	Oscar Salvador, Matthew Wilcox, Bharata B Rao, Alistair Popple,
	haoxin

Zi Yan <ziy@nvidia.com> writes:

> On 26 Dec 2022, at 19:28, Huang Ying wrote:
>
>> Just move the position of 2 functions.  There's no any functionality
>> change.  This is to make it easier to review the next patch via
>> putting code near its position in the next patch.
>
> This probably can be merged into prior patches.

I will check whether this will make patch harder to be reviewed.  If
not, I will merged them.  Previously, I had merged this patch with the
next patch in series, and it turns out this make the patches much harder
to be reviewed.

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 8/8] migrate_pages: batch flushing TLB
  2023-01-03 19:19   ` Zi Yan
@ 2023-01-04  1:41     ` Huang, Ying
  0 siblings, 0 replies; 39+ messages in thread
From: Huang, Ying @ 2023-01-04  1:41 UTC (permalink / raw)
  To: Zi Yan
  Cc: Andrew Morton, linux-mm, linux-kernel, Yang Shi, Baolin Wang,
	Oscar Salvador, Matthew Wilcox, Bharata B Rao, Alistair Popple,
	haoxin

Zi Yan <ziy@nvidia.com> writes:

> On 26 Dec 2022, at 19:28, Huang Ying wrote:
>
>> The TLB flushing will cost quite some CPU cycles during the folio
>> migration in some situations.  For example, when migrate a folio of a
>> process with multiple active threads that run on multiple CPUs.  After
>> batching the _unmap and _move in migrate_pages(), the TLB flushing can
>> be batched easily with the existing TLB flush batching mechanism.
>> This patch implements that.
>>
>> We use the following test case to test the patch.
>>
>> On a 2-socket Intel server,
>>
>> - Run pmbench memory accessing benchmark
>>
>> - Run `migratepages` to migrate pages of pmbench between node 0 and
>>   node 1 back and forth.
>>
>> With the patch, the TLB flushing IPI reduces 99.1% during the test and
>> the number of pages migrated successfully per second increases 291.7%.
>>
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Yang Shi <shy828301@gmail.com>
>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Bharata B Rao <bharata@amd.com>
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: haoxin <xhao@linux.alibaba.com>
>> ---
>>  mm/migrate.c |  4 +++-
>>  mm/rmap.c    | 20 +++++++++++++++++---
>>  2 files changed, 20 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 70a40b8fee1f..d7413164e748 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1215,7 +1215,7 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
>>  		/* Establish migration ptes */
>>  		VM_BUG_ON_FOLIO(folio_test_anon(src) &&
>>  			       !folio_test_ksm(src) && !anon_vma, src);
>> -		try_to_migrate(src, 0);
>> +		try_to_migrate(src, TTU_BATCH_FLUSH);
>>  		page_was_mapped = 1;
>>  	}
>>
>> @@ -1732,6 +1732,8 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>>  	stats->nr_thp_failed += thp_retry;
>>  	stats->nr_failed_pages += nr_retry_pages;
>>  move:
>> +	try_to_unmap_flush();
>> +
>>  	retry = 1;
>>  	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
>>  		retry = 0;
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index b616870a09be..2e125f3e462e 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1976,7 +1976,21 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>>  		} else {
>>  			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>>  			/* Nuke the page table entry. */
>> -			pteval = ptep_clear_flush(vma, address, pvmw.pte);
>> +			if (should_defer_flush(mm, flags)) {
>> +				/*
>> +				 * We clear the PTE but do not flush so potentially
>> +				 * a remote CPU could still be writing to the folio.
>> +				 * If the entry was previously clean then the
>> +				 * architecture must guarantee that a clear->dirty
>> +				 * transition on a cached TLB entry is written through
>> +				 * and traps if the PTE is unmapped.
>> +				 */
>> +				pteval = ptep_get_and_clear(mm, address, pvmw.pte);
>> +
>> +				set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
>> +			} else {
>> +				pteval = ptep_clear_flush(vma, address, pvmw.pte);
>> +			}
>>  		}
>>
>
> This is only for PTE mapped pages, right? We also need something similar
> in set_pmd_migration_entry() in mm/huge_memory.c for PMD-mapped THPs.
> Oh, since you limit NR_MAX_BATCHED_MIGRATION to HPAGE_PMD_NR and count
> nr_pages with folio_nr_pages(), THPs will only be migrated one by one.
> This is not obvious from the cover letter.
>
> Are you planning to support batched THP migration? If not, it might be
> better to update cover letter to be explicit about it and add comments
> in migrate_pages(). It would be nice to also note that we need to
> increase NR_MAX_BATCHED_MIGRATION beyond HPAGE_PMD_NR and make similar
> changes in set_pmd_migration_entry() to get batched THP migration support.

For now, I have no plan to support batching THP migration.  Because the
overhead of THP TLB shootdown is only 1/512 of that of the 4KB normal
page.  I will add some words in patch description for that.

Best Regards,
Huang, Ying

>>  		/* Set the dirty flag on the folio now the pte is gone. */
>> @@ -2148,10 +2162,10 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags)
>>
>>  	/*
>>  	 * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and
>> -	 * TTU_SPLIT_HUGE_PMD and TTU_SYNC flags.
>> +	 * TTU_SPLIT_HUGE_PMD, TTU_SYNC, and TTU_BATCH_FLUSH flags.
>>  	 */
>>  	if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD |
>> -					TTU_SYNC)))
>> +					TTU_SYNC | TTU_BATCH_FLUSH)))
>>  		return;
>>
>>  	if (folio_is_zone_device(folio) &&
>> -- 
>> 2.35.1

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/8] migrate_pages: share more code between _unmap and _move
  2022-12-27  0:28 ` [PATCH 7/8] migrate_pages: share more code between _unmap and _move Huang Ying
@ 2023-01-04  7:12   ` Alistair Popple
  2023-01-06  4:15     ` Huang, Ying
  0 siblings, 1 reply; 39+ messages in thread
From: Alistair Popple @ 2023-01-04  7:12 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	haoxin


Huang Ying <ying.huang@intel.com> writes:

> This is a code cleanup patch to reduce the duplicated code between the
> _unmap and _move stages of migrate_pages().  No functionality change
> is expected.
>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Bharata B Rao <bharata@amd.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: haoxin <xhao@linux.alibaba.com>
> ---
>  mm/migrate.c | 208 ++++++++++++++++++++-------------------------------
>  1 file changed, 82 insertions(+), 126 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 70b987391296..70a40b8fee1f 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1030,21 +1030,26 @@ static void __migrate_folio_extract(struct folio *dst,
>  static void migrate_folio_undo_src(struct folio *src,
>  				   int page_was_mapped,
>  				   struct anon_vma *anon_vma,
> +				   bool locked,
>  				   struct list_head *ret)
>  {
>  	if (page_was_mapped)
>  		remove_migration_ptes(src, src, false);
>  	if (anon_vma)
>  		put_anon_vma(anon_vma);
> -	folio_unlock(src);
> -	list_move_tail(&src->lru, ret);
> +	if (locked)
> +		folio_unlock(src);
> +	if (ret)
> +		list_move_tail(&src->lru, ret);
>  }
>  
>  static void migrate_folio_undo_dst(struct folio *dst,
> +				   bool locked,
>  				   free_page_t put_new_page,
>  				   unsigned long private)
>  {
> -	folio_unlock(dst);
> +	if (locked)
> +		folio_unlock(dst);
>  	if (put_new_page)
>  		put_new_page(&dst->page, private);
>  	else
> @@ -1068,14 +1073,44 @@ static void migrate_folio_done(struct folio *src,
>  		folio_put(src);
>  }
>  
> -static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
> -				 int force, bool force_lock, enum migrate_mode mode)
> +/* Obtain the lock on page, remove all ptes. */
> +static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
> +			       unsigned long private, struct folio *src,
> +			       struct folio **dstp, int force, bool force_lock,
> +			       enum migrate_mode mode, enum migrate_reason reason,
> +			       struct list_head *ret)

Overall I think this should be refactored into some smaller, simpler
functions as the error handling and the giant switch statement in
migrate_pages_batch() is making my head hurt :-)

>  {
> -	int rc = -EAGAIN;
> +	struct folio *dst;
> +	int rc = MIGRATEPAGE_UNMAP;
> +	struct page *newpage = NULL;
>  	int page_was_mapped = 0;
>  	struct anon_vma *anon_vma = NULL;
>  	bool is_lru = !__PageMovable(&src->page);
> +	bool locked = false;
> +	bool dst_locked = false;
> +
> +	if (!thp_migration_supported() && folio_test_transhuge(src))
> +		return -ENOSYS;

This would be easier to follow if it was just moved to the caller and
the -ENOSYS switch case removed.

> +	if (folio_ref_count(src) == 1) {
> +		/* Folio was freed from under us. So we are done. */
> +		folio_clear_active(src);
> +		folio_clear_unevictable(src);
> +		/* free_pages_prepare() will clear PG_isolated. */
> +		list_del(&src->lru);
> +		migrate_folio_done(src, reason);
> +		return MIGRATEPAGE_SUCCESS;
> +	}

This is the only case that returns MIGRATEPAGE_SUCCESS so would also be
clearer if moved to the caller eliminating another switch case.

> +
> +	newpage = get_new_page(&src->page, private);
> +	if (!newpage)
> +		return -ENOMEM;
> +	dst = page_folio(newpage);
> +	*dstp = dst;
> +
> +	dst->private = NULL;

This could be moved until after the folio_test_writeback(), which might
make the split I suggest below easier.

>  
> +	rc = -EAGAIN;

We can just initialise rc to -EAGAIN.

>  	if (!folio_trylock(src)) {
>  		if (!force || mode == MIGRATE_ASYNC)
>  			goto out;
> @@ -1103,6 +1138,7 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>  
>  		folio_lock(src);
>  	}
> +	locked = true;

A seperate helper for locking the folio would be better IMHO.

>  
>  	if (folio_test_writeback(src)) {
>  		/*
> @@ -1117,10 +1153,10 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>  			break;
>  		default:
>  			rc = -EBUSY;
> -			goto out_unlock;
> +			goto out;
>  		}
>  		if (!force)
> -			goto out_unlock;
> +			goto out;
>  		folio_wait_writeback(src);
>  	}

This is the only path that return -EBUSY, so could be integrated into
the helper suggested above for locking the folio.
  
> @@ -1150,7 +1186,8 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>  	 * This is much like races on refcount of oldpage: just don't BUG().
>  	 */
>  	if (unlikely(!folio_trylock(dst)))
> -		goto out_unlock;
> +		goto out;
> +	dst_locked = true;

So how about splitting migrate_folio_unmap() into two functions:

/*
 * Prepare a folio for migration by locking the source, ensuring
 * writeback is complete and allocating and locking a new destination
 * page.
 */
migrate_folio_prepare(new_page_t get_new_page, free_page_t put_new_page,
                      unsigned long private, struct folio *src,
	              struct folio **dstp, int force, bool force_lock)

migrate_folio_unmap(struct folio *src, struct folio *dst,
                    enum migrate_mode mode, enum migrate_reason reason)

Obviously we still have the various failure scenarios to deal with, but
I think it would be more readable if these were limited to undoing the
migrate_folio_prepare() step in the caller. I think the list
manipulation would also be more obvious if left to the caller.

>  	if (unlikely(!is_lru)) {
>  		__migrate_folio_record(dst, page_was_mapped, anon_vma);
> @@ -1172,7 +1209,7 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>  	if (!src->mapping) {
>  		if (folio_test_private(src)) {
>  			try_to_free_buffers(src);
> -			goto out_unlock_both;
> +			goto out;
>  		}
>  	} else if (folio_mapped(src)) {
>  		/* Establish migration ptes */
> @@ -1187,75 +1224,27 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>  		return MIGRATEPAGE_UNMAP;

I reaslise this is pre-existing but the mixing of setting rc or
returning codes directly is a bit hard to follow.

>  	}
>  
> -
> -	if (page_was_mapped)
> -		remove_migration_ptes(src, src, false);
> -
> -out_unlock_both:
> -	folio_unlock(dst);
> -out_unlock:
> -	/* Drop an anon_vma reference if we took one */
> -	if (anon_vma)
> -		put_anon_vma(anon_vma);
> -	folio_unlock(src);
>  out:
> -
> -	return rc;
> -}
> -
> -/* Obtain the lock on page, remove all ptes. */
> -static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
> -			       unsigned long private, struct folio *src,
> -			       struct folio **dstp, int force, bool force_lock,
> -			       enum migrate_mode mode, enum migrate_reason reason,
> -			       struct list_head *ret)
> -{
> -	struct folio *dst;
> -	int rc = MIGRATEPAGE_UNMAP;
> -	struct page *newpage = NULL;
> -
> -	if (!thp_migration_supported() && folio_test_transhuge(src))
> -		return -ENOSYS;
> -
> -	if (folio_ref_count(src) == 1) {
> -		/* Folio was freed from under us. So we are done. */
> -		folio_clear_active(src);
> -		folio_clear_unevictable(src);
> -		/* free_pages_prepare() will clear PG_isolated. */
> -		list_del(&src->lru);
> -		migrate_folio_done(src, reason);
> -		return MIGRATEPAGE_SUCCESS;
> -	}
> -
> -	newpage = get_new_page(&src->page, private);
> -	if (!newpage)
> -		return -ENOMEM;
> -	dst = page_folio(newpage);
> -	*dstp = dst;
> -
> -	dst->private = NULL;
> -	rc = __migrate_folio_unmap(src, dst, force, force_lock, mode);
> -	if (rc == MIGRATEPAGE_UNMAP)
> -		return rc;
> -
>  	/*
>  	 * A page that has not been migrated will have kept its
>  	 * references and be restored.
>  	 */
>  	/* restore the folio to right list. */
> -	if (rc != -EAGAIN && rc != -EDEADLOCK)
> -		list_move_tail(&src->lru, ret);
> +	if (rc == -EAGAIN || rc == -EDEADLOCK)
> +		ret = NULL;
>  
> -	if (put_new_page)
> -		put_new_page(&dst->page, private);
> -	else
> -		folio_put(dst);
> +	migrate_folio_undo_src(src, page_was_mapped, anon_vma, locked, ret);
> +	if (dst)
> +		migrate_folio_undo_dst(dst, dst_locked, put_new_page, private);
>  
>  	return rc;
>  }
>  
> -static int __migrate_folio_move(struct folio *src, struct folio *dst,
> -				enum migrate_mode mode)
> +/* Migrate the folio to the newly allocated folio in dst. */
> +static int migrate_folio_move(free_page_t put_new_page, unsigned long private,
> +			      struct folio *src, struct folio *dst,
> +			      enum migrate_mode mode, enum migrate_reason reason,
> +			      struct list_head *ret)
>  {
>  	int rc;
>  	int page_was_mapped = 0;
> @@ -1264,9 +1253,10 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst,
>  	__migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
>  
>  	rc = move_to_new_folio(dst, src, mode);
> +	if (rc)
> +		goto out;
>  
> -	if (rc != -EAGAIN)
> -		list_del(&dst->lru);
> +	list_del(&dst->lru);
>  	/*
>  	 * When successful, push dst to LRU immediately: so that if it
>  	 * turns out to be an mlocked page, remove_migration_ptes() will
> @@ -1276,74 +1266,40 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst,
>  	 * unsuccessful, and other cases when a page has been temporarily
>  	 * isolated from the unevictable LRU: but this case is the easiest.
>  	 */
> -	if (rc == MIGRATEPAGE_SUCCESS) {
> -		folio_add_lru(dst);
> -		if (page_was_mapped)
> -			lru_add_drain();
> -	}
> -
> -	if (rc == -EAGAIN) {
> -		__migrate_folio_record(dst, page_was_mapped, anon_vma);
> -		return rc;
> -	}
> -
> +	folio_add_lru(dst);
>  	if (page_was_mapped)
> -		remove_migration_ptes(src,
> -			rc == MIGRATEPAGE_SUCCESS ? dst : src, false);
> +		lru_add_drain();
>  
> +	if (page_was_mapped)
> +		remove_migration_ptes(src, dst, false);
>  	folio_unlock(dst);
> -	/* Drop an anon_vma reference if we took one */
> -	if (anon_vma)
> -		put_anon_vma(anon_vma);
> -	folio_unlock(src);
> +	set_page_owner_migrate_reason(&dst->page, reason);
>  	/*
>  	 * If migration is successful, decrease refcount of dst,
>  	 * which will not free the page because new page owner increased
>  	 * refcounter.
>  	 */
> -	if (rc == MIGRATEPAGE_SUCCESS)
> -		folio_put(dst);
> -
> -	return rc;
> -}
> -
> -/* Migrate the folio to the newly allocated folio in dst. */
> -static int migrate_folio_move(free_page_t put_new_page, unsigned long private,
> -			      struct folio *src, struct folio *dst,
> -			      enum migrate_mode mode, enum migrate_reason reason,
> -			      struct list_head *ret)
> -{
> -	int rc;
> -
> -	rc = __migrate_folio_move(src, dst, mode);
> -	if (rc == MIGRATEPAGE_SUCCESS)
> -		set_page_owner_migrate_reason(&dst->page, reason);
> -
> -	if (rc != -EAGAIN) {
> -		/*
> -		 * A folio that has been migrated has all references
> -		 * removed and will be freed. A folio that has not been
> -		 * migrated will have kept its references and be restored.
> -		 */
> -		list_del(&src->lru);
> -	}
> +	folio_put(dst);
>  
>  	/*
> -	 * If migration is successful, releases reference grabbed during
> -	 * isolation. Otherwise, restore the folio to right list unless
> -	 * we want to retry.
> +	 * A page that has been migrated has all references removed
> +	 * and will be freed.
>  	 */
> -	if (rc == MIGRATEPAGE_SUCCESS) {
> -		migrate_folio_done(src, reason);
> -	} else if (rc != -EAGAIN) {
> -		list_add_tail(&src->lru, ret);
> +	list_del(&src->lru);
> +	migrate_folio_undo_src(src, 0, anon_vma, true, NULL);
> +	migrate_folio_done(src, reason);
>  
> -		if (put_new_page)
> -			put_new_page(&dst->page, private);
> -		else
> -			folio_put(dst);
> +	return rc;
> +out:
> +	if (rc == -EAGAIN) {
> +		__migrate_folio_record(dst, page_was_mapped, anon_vma);
> +		return rc;
>  	}
>  
> +	migrate_folio_undo_src(src, page_was_mapped, anon_vma, true, ret);
> +	list_del(&dst->lru);
> +	migrate_folio_undo_dst(dst, true, put_new_page, private);
> +
>  	return rc;
>  }
>  
> @@ -1849,9 +1805,9 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>  
>  		__migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
>  		migrate_folio_undo_src(folio, page_was_mapped, anon_vma,
> -				       ret_folios);
> +				       true, ret_folios);
>  		list_del(&dst->lru);
> -		migrate_folio_undo_dst(dst, put_new_page, private);
> +		migrate_folio_undo_dst(dst, true, put_new_page, private);
>  		dst = dst2;
>  		dst2 = list_next_entry(dst, lru);
>  	}


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/8] migrate_pages: organize stats with struct migrate_pages_stats
  2022-12-27  0:28 ` [PATCH 1/8] migrate_pages: organize stats with struct migrate_pages_stats Huang Ying
  2023-01-03 18:06   ` Zi Yan
@ 2023-01-05  3:02   ` Alistair Popple
  2023-01-05  5:53     ` Huang, Ying
  1 sibling, 1 reply; 39+ messages in thread
From: Alistair Popple @ 2023-01-05  3:02 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	haoxin


Huang Ying <ying.huang@intel.com> writes:

> Define struct migrate_pages_stats to organize the various statistics
> in migrate_pages().  This makes it easier to collect and consume the
> statistics in multiple functions.  This will be needed in the
> following patches in the series.
>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Bharata B Rao <bharata@amd.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: haoxin <xhao@linux.alibaba.com>
> ---
>  mm/migrate.c | 58 +++++++++++++++++++++++++++++-----------------------
>  1 file changed, 32 insertions(+), 26 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index a4d3fc65085f..ec9263a33d38 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1396,6 +1396,14 @@ static inline int try_split_folio(struct folio *folio, struct list_head *split_f
>  	return rc;
>  }
>  
> +struct migrate_pages_stats {
> +	int nr_succeeded;
> +	int nr_failed_pages;
> +	int nr_thp_succeeded;
> +	int nr_thp_failed;
> +	int nr_thp_split;

I think some brief comments in the code for what each stat is tracking
and their relationship to each other would be helpful (ie. does
nr_succeeded include thp subpages, etc). Or at least a reference to
where this is documented (ie. page_migration.rst) as I recall there has
been some confusion in the past that has lead to bugs.

Otherwise the patch looks good so:

Reviewed-by: Alistair Popple <apopple@nvidia.com>

> +};
> +
>  /*
>   * migrate_pages - migrate the folios specified in a list, to the free folios
>   *		   supplied as the target for the page migration
> @@ -1430,13 +1438,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  	int large_retry = 1;
>  	int thp_retry = 1;
>  	int nr_failed = 0;
> -	int nr_failed_pages = 0;
>  	int nr_retry_pages = 0;
> -	int nr_succeeded = 0;
> -	int nr_thp_succeeded = 0;
>  	int nr_large_failed = 0;
> -	int nr_thp_failed = 0;
> -	int nr_thp_split = 0;
>  	int pass = 0;
>  	bool is_large = false;
>  	bool is_thp = false;
> @@ -1446,9 +1449,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  	LIST_HEAD(split_folios);
>  	bool nosplit = (reason == MR_NUMA_MISPLACED);
>  	bool no_split_folio_counting = false;
> +	struct migrate_pages_stats stats;
>  
>  	trace_mm_migrate_pages_start(mode, reason);
>  
> +	memset(&stats, 0, sizeof(stats));
>  split_folio_migration:
>  	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
>  		retry = 0;
> @@ -1502,9 +1507,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  				/* Large folio migration is unsupported */
>  				if (is_large) {
>  					nr_large_failed++;
> -					nr_thp_failed += is_thp;
> +					stats.nr_thp_failed += is_thp;
>  					if (!try_split_folio(folio, &split_folios)) {
> -						nr_thp_split += is_thp;
> +						stats.nr_thp_split += is_thp;
>  						break;
>  					}
>  				/* Hugetlb migration is unsupported */
> @@ -1512,7 +1517,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  					nr_failed++;
>  				}
>  
> -				nr_failed_pages += nr_pages;
> +				stats.nr_failed_pages += nr_pages;
>  				list_move_tail(&folio->lru, &ret_folios);
>  				break;
>  			case -ENOMEM:
> @@ -1522,13 +1527,13 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  				 */
>  				if (is_large) {
>  					nr_large_failed++;
> -					nr_thp_failed += is_thp;
> +					stats.nr_thp_failed += is_thp;
>  					/* Large folio NUMA faulting doesn't split to retry. */
>  					if (!nosplit) {
>  						int ret = try_split_folio(folio, &split_folios);
>  
>  						if (!ret) {
> -							nr_thp_split += is_thp;
> +							stats.nr_thp_split += is_thp;
>  							break;
>  						} else if (reason == MR_LONGTERM_PIN &&
>  							   ret == -EAGAIN) {
> @@ -1546,7 +1551,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  					nr_failed++;
>  				}
>  
> -				nr_failed_pages += nr_pages + nr_retry_pages;
> +				stats.nr_failed_pages += nr_pages + nr_retry_pages;
>  				/*
>  				 * There might be some split folios of fail-to-migrate large
>  				 * folios left in split_folios list. Move them back to migration
> @@ -1556,7 +1561,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  				list_splice_init(&split_folios, from);
>  				/* nr_failed isn't updated for not used */
>  				nr_large_failed += large_retry;
> -				nr_thp_failed += thp_retry;
> +				stats.nr_thp_failed += thp_retry;
>  				goto out;
>  			case -EAGAIN:
>  				if (is_large) {
> @@ -1568,8 +1573,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  				nr_retry_pages += nr_pages;
>  				break;
>  			case MIGRATEPAGE_SUCCESS:
> -				nr_succeeded += nr_pages;
> -				nr_thp_succeeded += is_thp;
> +				stats.nr_succeeded += nr_pages;
> +				stats.nr_thp_succeeded += is_thp;
>  				break;
>  			default:
>  				/*
> @@ -1580,20 +1585,20 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  				 */
>  				if (is_large) {
>  					nr_large_failed++;
> -					nr_thp_failed += is_thp;
> +					stats.nr_thp_failed += is_thp;
>  				} else if (!no_split_folio_counting) {
>  					nr_failed++;
>  				}
>  
> -				nr_failed_pages += nr_pages;
> +				stats.nr_failed_pages += nr_pages;
>  				break;
>  			}
>  		}
>  	}
>  	nr_failed += retry;
>  	nr_large_failed += large_retry;
> -	nr_thp_failed += thp_retry;
> -	nr_failed_pages += nr_retry_pages;
> +	stats.nr_thp_failed += thp_retry;
> +	stats.nr_failed_pages += nr_retry_pages;
>  	/*
>  	 * Try to migrate split folios of fail-to-migrate large folios, no
>  	 * nr_failed counting in this round, since all split folios of a
> @@ -1626,16 +1631,17 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  	if (list_empty(from))
>  		rc = 0;
>  
> -	count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
> -	count_vm_events(PGMIGRATE_FAIL, nr_failed_pages);
> -	count_vm_events(THP_MIGRATION_SUCCESS, nr_thp_succeeded);
> -	count_vm_events(THP_MIGRATION_FAIL, nr_thp_failed);
> -	count_vm_events(THP_MIGRATION_SPLIT, nr_thp_split);
> -	trace_mm_migrate_pages(nr_succeeded, nr_failed_pages, nr_thp_succeeded,
> -			       nr_thp_failed, nr_thp_split, mode, reason);
> +	count_vm_events(PGMIGRATE_SUCCESS, stats.nr_succeeded);
> +	count_vm_events(PGMIGRATE_FAIL, stats.nr_failed_pages);
> +	count_vm_events(THP_MIGRATION_SUCCESS, stats.nr_thp_succeeded);
> +	count_vm_events(THP_MIGRATION_FAIL, stats.nr_thp_failed);
> +	count_vm_events(THP_MIGRATION_SPLIT, stats.nr_thp_split);
> +	trace_mm_migrate_pages(stats.nr_succeeded, stats.nr_failed_pages,
> +			       stats.nr_thp_succeeded, stats.nr_thp_failed,
> +			       stats.nr_thp_split, mode, reason);
>  
>  	if (ret_succeeded)
> -		*ret_succeeded = nr_succeeded;
> +		*ret_succeeded = stats.nr_succeeded;
>  
>  	return rc;
>  }


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/8] migrate_pages: separate hugetlb folios migration
  2022-12-27  0:28 ` [PATCH 2/8] migrate_pages: separate hugetlb folios migration Huang Ying
  2022-12-28 23:17   ` Andrew Morton
@ 2023-01-05  4:13   ` Alistair Popple
  2023-01-05  5:51     ` Huang, Ying
  1 sibling, 1 reply; 39+ messages in thread
From: Alistair Popple @ 2023-01-05  4:13 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	haoxin


Huang Ying <ying.huang@intel.com> writes:

> This is a preparation patch to batch the folio unmapping and moving
> for the non-hugetlb folios.  Based on that we can batch the TLB
> shootdown during the folio migration and make it possible to use some
> hardware accelerator for the folio copying.
>
> In this patch the hugetlb folios and non-hugetlb folios migration is
> separated in migrate_pages() to make it easy to change the non-hugetlb
> folios migration implementation.
>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Bharata B Rao <bharata@amd.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: haoxin <xhao@linux.alibaba.com>
> ---
>  mm/migrate.c | 114 ++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 99 insertions(+), 15 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index ec9263a33d38..bdbe73fe2eb7 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1404,6 +1404,87 @@ struct migrate_pages_stats {
>  	int nr_thp_split;
>  };
>  
> +static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
> +			    free_page_t put_new_page, unsigned long private,
> +			    enum migrate_mode mode, int reason,
> +			    struct migrate_pages_stats *stats,
> +			    struct list_head *ret_folios)
> +{
> +	int retry = 1;
> +	int nr_failed = 0;
> +	int nr_retry_pages = 0;
> +	int pass = 0;
> +	struct folio *folio, *folio2;
> +	int rc = 0, nr_pages;
> +
> +	for (pass = 0; pass < 10 && retry; pass++) {
> +		retry = 0;
> +		nr_retry_pages = 0;
> +
> +		list_for_each_entry_safe(folio, folio2, from, lru) {
> +			if (!folio_test_hugetlb(folio))
> +				continue;
> +
> +			nr_pages = folio_nr_pages(folio);
> +
> +			cond_resched();
> +
> +			rc = unmap_and_move_huge_page(get_new_page,
> +						      put_new_page, private,
> +						      &folio->page, pass > 2, mode,
> +						      reason, ret_folios);
> +			/*
> +			 * The rules are:
> +			 *	Success: hugetlb folio will be put back
> +			 *	-EAGAIN: stay on the from list
> +			 *	-ENOMEM: stay on the from list
> +			 *	-ENOSYS: stay on the from list
> +			 *	Other errno: put on ret_folios list
> +			 */
> +			switch(rc) {
> +			case -ENOSYS:
> +				/* Hugetlb migration is unsupported */
> +				nr_failed++;
> +				stats->nr_failed_pages += nr_pages;
> +				list_move_tail(&folio->lru, ret_folios);
> +				break;
> +			case -ENOMEM:
> +				/*
> +				 * When memory is low, don't bother to try to migrate
> +				 * other folios, just exit.
> +				 */
> +				nr_failed++;

This currently isn't relevant for -ENOMEM and I think it would be
clearer if it was dropped.

> +				stats->nr_failed_pages += nr_pages;

Makes sense not to continue migration with low memory, but shouldn't we
add the remaining unmigrated hugetlb folios to stats->nr_failed_pages as
well? Ie. don't we still have to continue the iteration to to find and
account for these?

> +				goto out;

Given this is the only use of the out label, and that there is a special
case for -ENOMEM there anyway I think it would be clearer to return
directly.

> +			case -EAGAIN:
> +				retry++;
> +				nr_retry_pages += nr_pages;
> +				break;
> +			case MIGRATEPAGE_SUCCESS:
> +				stats->nr_succeeded += nr_pages;
> +				break;
> +			default:
> +				/*
> +				 * Permanent failure (-EBUSY, etc.):
> +				 * unlike -EAGAIN case, the failed folio is
> +				 * removed from migration folio list and not
> +				 * retried in the next outer loop.
> +				 */
> +				nr_failed++;
> +				stats->nr_failed_pages += nr_pages;
> +				break;
> +			}
> +		}
> +	}
> +out:
> +	nr_failed += retry;
> +	stats->nr_failed_pages += nr_retry_pages;
> +	if (rc != -ENOMEM)
> +		rc = nr_failed;
> +
> +	return rc;
> +}
> +
>  /*
>   * migrate_pages - migrate the folios specified in a list, to the free folios
>   *		   supplied as the target for the page migration
> @@ -1437,7 +1518,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  	int retry = 1;
>  	int large_retry = 1;
>  	int thp_retry = 1;
> -	int nr_failed = 0;
> +	int nr_failed;
>  	int nr_retry_pages = 0;
>  	int nr_large_failed = 0;
>  	int pass = 0;
> @@ -1454,6 +1535,12 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  	trace_mm_migrate_pages_start(mode, reason);
>  
>  	memset(&stats, 0, sizeof(stats));
> +	rc = migrate_hugetlbs(from, get_new_page, put_new_page, private, mode, reason,
> +			      &stats, &ret_folios);
> +	if (rc < 0)
> +		goto out;
> +	nr_failed = rc;
> +
>  split_folio_migration:
>  	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
>  		retry = 0;
> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  		nr_retry_pages = 0;
>  
>  		list_for_each_entry_safe(folio, folio2, from, lru) {
> +			if (folio_test_hugetlb(folio)) {

How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
any hugetlb folios off the from list?

> +				list_move_tail(&folio->lru, &ret_folios);
> +				continue;
> +			}
> +
>  			/*
>  			 * Large folio statistics is based on the source large
>  			 * folio. Capture required information that might get
>  			 * lost during migration.
>  			 */
> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
> +			is_large = folio_test_large(folio);
>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>  			nr_pages = folio_nr_pages(folio);
> +
>  			cond_resched();
>  
> -			if (folio_test_hugetlb(folio))
> -				rc = unmap_and_move_huge_page(get_new_page,
> -						put_new_page, private,
> -						&folio->page, pass > 2, mode,
> -						reason,
> -						&ret_folios);
> -			else
> -				rc = unmap_and_move(get_new_page, put_new_page,
> -						private, folio, pass > 2, mode,
> -						reason, &ret_folios);
> +			rc = unmap_and_move(get_new_page, put_new_page,
> +					    private, folio, pass > 2, mode,
> +					    reason, &ret_folios);
>  			/*
>  			 * The rules are:
> -			 *	Success: non hugetlb folio will be freed, hugetlb
> -			 *		 folio will be put back
> +			 *	Success: folio will be freed
>  			 *	-EAGAIN: stay on the from list
>  			 *	-ENOMEM: stay on the from list
>  			 *	-ENOSYS: stay on the from list
> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  						stats.nr_thp_split += is_thp;
>  						break;
>  					}
> -				/* Hugetlb migration is unsupported */
>  				} else if (!no_split_folio_counting) {
>  					nr_failed++;
>  				}


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/8] migrate_pages: separate hugetlb folios migration
  2023-01-05  4:13   ` Alistair Popple
@ 2023-01-05  5:51     ` Huang, Ying
  2023-01-05  6:43       ` Alistair Popple
  0 siblings, 1 reply; 39+ messages in thread
From: Huang, Ying @ 2023-01-05  5:51 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Andrew Morton, linux-mm, linux-kernel, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	haoxin

Alistair Popple <apopple@nvidia.com> writes:

> Huang Ying <ying.huang@intel.com> writes:
>
>> This is a preparation patch to batch the folio unmapping and moving
>> for the non-hugetlb folios.  Based on that we can batch the TLB
>> shootdown during the folio migration and make it possible to use some
>> hardware accelerator for the folio copying.
>>
>> In this patch the hugetlb folios and non-hugetlb folios migration is
>> separated in migrate_pages() to make it easy to change the non-hugetlb
>> folios migration implementation.
>>
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Yang Shi <shy828301@gmail.com>
>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Bharata B Rao <bharata@amd.com>
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: haoxin <xhao@linux.alibaba.com>
>> ---
>>  mm/migrate.c | 114 ++++++++++++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 99 insertions(+), 15 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index ec9263a33d38..bdbe73fe2eb7 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1404,6 +1404,87 @@ struct migrate_pages_stats {
>>  	int nr_thp_split;
>>  };
>>  
>> +static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
>> +			    free_page_t put_new_page, unsigned long private,
>> +			    enum migrate_mode mode, int reason,
>> +			    struct migrate_pages_stats *stats,
>> +			    struct list_head *ret_folios)
>> +{
>> +	int retry = 1;
>> +	int nr_failed = 0;
>> +	int nr_retry_pages = 0;
>> +	int pass = 0;
>> +	struct folio *folio, *folio2;
>> +	int rc = 0, nr_pages;
>> +
>> +	for (pass = 0; pass < 10 && retry; pass++) {
>> +		retry = 0;
>> +		nr_retry_pages = 0;
>> +
>> +		list_for_each_entry_safe(folio, folio2, from, lru) {
>> +			if (!folio_test_hugetlb(folio))
>> +				continue;
>> +
>> +			nr_pages = folio_nr_pages(folio);
>> +
>> +			cond_resched();
>> +
>> +			rc = unmap_and_move_huge_page(get_new_page,
>> +						      put_new_page, private,
>> +						      &folio->page, pass > 2, mode,
>> +						      reason, ret_folios);
>> +			/*
>> +			 * The rules are:
>> +			 *	Success: hugetlb folio will be put back
>> +			 *	-EAGAIN: stay on the from list
>> +			 *	-ENOMEM: stay on the from list
>> +			 *	-ENOSYS: stay on the from list
>> +			 *	Other errno: put on ret_folios list
>> +			 */
>> +			switch(rc) {
>> +			case -ENOSYS:
>> +				/* Hugetlb migration is unsupported */
>> +				nr_failed++;
>> +				stats->nr_failed_pages += nr_pages;
>> +				list_move_tail(&folio->lru, ret_folios);
>> +				break;
>> +			case -ENOMEM:
>> +				/*
>> +				 * When memory is low, don't bother to try to migrate
>> +				 * other folios, just exit.
>> +				 */
>> +				nr_failed++;
>
> This currently isn't relevant for -ENOMEM and I think it would be
> clearer if it was dropped.

OK.

>> +				stats->nr_failed_pages += nr_pages;
>
> Makes sense not to continue migration with low memory, but shouldn't we
> add the remaining unmigrated hugetlb folios to stats->nr_failed_pages as
> well? Ie. don't we still have to continue the iteration to to find and
> account for these?

I think nr_failed_pages only counts tried pages.  IIUC, it's the
original behavior and behavior for non-hugetlb pages too.

>> +				goto out;
>
> Given this is the only use of the out label, and that there is a special
> case for -ENOMEM there anyway I think it would be clearer to return
> directly.

Sounds good.  Will do that in next version.

>> +			case -EAGAIN:
>> +				retry++;
>> +				nr_retry_pages += nr_pages;
>> +				break;
>> +			case MIGRATEPAGE_SUCCESS:
>> +				stats->nr_succeeded += nr_pages;
>> +				break;
>> +			default:
>> +				/*
>> +				 * Permanent failure (-EBUSY, etc.):
>> +				 * unlike -EAGAIN case, the failed folio is
>> +				 * removed from migration folio list and not
>> +				 * retried in the next outer loop.
>> +				 */
>> +				nr_failed++;
>> +				stats->nr_failed_pages += nr_pages;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +out:
>> +	nr_failed += retry;
>> +	stats->nr_failed_pages += nr_retry_pages;
>> +	if (rc != -ENOMEM)
>> +		rc = nr_failed;
>> +
>> +	return rc;
>> +}
>> +
>>  /*
>>   * migrate_pages - migrate the folios specified in a list, to the free folios
>>   *		   supplied as the target for the page migration
>> @@ -1437,7 +1518,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  	int retry = 1;
>>  	int large_retry = 1;
>>  	int thp_retry = 1;
>> -	int nr_failed = 0;
>> +	int nr_failed;
>>  	int nr_retry_pages = 0;
>>  	int nr_large_failed = 0;
>>  	int pass = 0;
>> @@ -1454,6 +1535,12 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  	trace_mm_migrate_pages_start(mode, reason);
>>  
>>  	memset(&stats, 0, sizeof(stats));
>> +	rc = migrate_hugetlbs(from, get_new_page, put_new_page, private, mode, reason,
>> +			      &stats, &ret_folios);
>> +	if (rc < 0)
>> +		goto out;
>> +	nr_failed = rc;
>> +
>>  split_folio_migration:
>>  	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
>>  		retry = 0;
>> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  		nr_retry_pages = 0;
>>  
>>  		list_for_each_entry_safe(folio, folio2, from, lru) {
>> +			if (folio_test_hugetlb(folio)) {
>
> How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
> any hugetlb folios off the from list?

Retried hugetlb folios will be kept in from list.

>> +				list_move_tail(&folio->lru, &ret_folios);
>> +				continue;
>> +			}
>> +
>>  			/*
>>  			 * Large folio statistics is based on the source large
>>  			 * folio. Capture required information that might get
>>  			 * lost during migration.
>>  			 */
>> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
>> +			is_large = folio_test_large(folio);
>>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>>  			nr_pages = folio_nr_pages(folio);
>> +
>>  			cond_resched();
>>  
>> -			if (folio_test_hugetlb(folio))
>> -				rc = unmap_and_move_huge_page(get_new_page,
>> -						put_new_page, private,
>> -						&folio->page, pass > 2, mode,
>> -						reason,
>> -						&ret_folios);
>> -			else
>> -				rc = unmap_and_move(get_new_page, put_new_page,
>> -						private, folio, pass > 2, mode,
>> -						reason, &ret_folios);
>> +			rc = unmap_and_move(get_new_page, put_new_page,
>> +					    private, folio, pass > 2, mode,
>> +					    reason, &ret_folios);
>>  			/*
>>  			 * The rules are:
>> -			 *	Success: non hugetlb folio will be freed, hugetlb
>> -			 *		 folio will be put back
>> +			 *	Success: folio will be freed
>>  			 *	-EAGAIN: stay on the from list
>>  			 *	-ENOMEM: stay on the from list
>>  			 *	-ENOSYS: stay on the from list
>> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  						stats.nr_thp_split += is_thp;
>>  						break;
>>  					}
>> -				/* Hugetlb migration is unsupported */
>>  				} else if (!no_split_folio_counting) {
>>  					nr_failed++;
>>  				}

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/8] migrate_pages: organize stats with struct migrate_pages_stats
  2023-01-05  3:02   ` Alistair Popple
@ 2023-01-05  5:53     ` Huang, Ying
  2023-01-05  6:50       ` Alistair Popple
  0 siblings, 1 reply; 39+ messages in thread
From: Huang, Ying @ 2023-01-05  5:53 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Andrew Morton, linux-mm, linux-kernel, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	haoxin

Alistair Popple <apopple@nvidia.com> writes:

> Huang Ying <ying.huang@intel.com> writes:
>
>> Define struct migrate_pages_stats to organize the various statistics
>> in migrate_pages().  This makes it easier to collect and consume the
>> statistics in multiple functions.  This will be needed in the
>> following patches in the series.
>>
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Yang Shi <shy828301@gmail.com>
>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Bharata B Rao <bharata@amd.com>
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: haoxin <xhao@linux.alibaba.com>
>> ---
>>  mm/migrate.c | 58 +++++++++++++++++++++++++++++-----------------------
>>  1 file changed, 32 insertions(+), 26 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index a4d3fc65085f..ec9263a33d38 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1396,6 +1396,14 @@ static inline int try_split_folio(struct folio *folio, struct list_head *split_f
>>  	return rc;
>>  }
>>  
>> +struct migrate_pages_stats {
>> +	int nr_succeeded;
>> +	int nr_failed_pages;
>> +	int nr_thp_succeeded;
>> +	int nr_thp_failed;
>> +	int nr_thp_split;
>
> I think some brief comments in the code for what each stat is tracking
> and their relationship to each other would be helpful (ie. does
> nr_succeeded include thp subpages, etc). Or at least a reference to
> where this is documented (ie. page_migration.rst) as I recall there has
> been some confusion in the past that has lead to bugs.

OK, will do that in the next version.

> Otherwise the patch looks good so:
>
> Reviewed-by: Alistair Popple <apopple@nvidia.com>

Thanks!

Best Regards,
Huang, Ying

>> +};
>> +
>>  /*
>>   * migrate_pages - migrate the folios specified in a list, to the free folios
>>   *		   supplied as the target for the page migration
>> @@ -1430,13 +1438,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  	int large_retry = 1;
>>  	int thp_retry = 1;
>>  	int nr_failed = 0;
>> -	int nr_failed_pages = 0;
>>  	int nr_retry_pages = 0;
>> -	int nr_succeeded = 0;
>> -	int nr_thp_succeeded = 0;
>>  	int nr_large_failed = 0;
>> -	int nr_thp_failed = 0;
>> -	int nr_thp_split = 0;
>>  	int pass = 0;
>>  	bool is_large = false;
>>  	bool is_thp = false;
>> @@ -1446,9 +1449,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  	LIST_HEAD(split_folios);
>>  	bool nosplit = (reason == MR_NUMA_MISPLACED);
>>  	bool no_split_folio_counting = false;
>> +	struct migrate_pages_stats stats;
>>  
>>  	trace_mm_migrate_pages_start(mode, reason);
>>  
>> +	memset(&stats, 0, sizeof(stats));
>>  split_folio_migration:
>>  	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
>>  		retry = 0;
>> @@ -1502,9 +1507,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  				/* Large folio migration is unsupported */
>>  				if (is_large) {
>>  					nr_large_failed++;
>> -					nr_thp_failed += is_thp;
>> +					stats.nr_thp_failed += is_thp;
>>  					if (!try_split_folio(folio, &split_folios)) {
>> -						nr_thp_split += is_thp;
>> +						stats.nr_thp_split += is_thp;
>>  						break;
>>  					}
>>  				/* Hugetlb migration is unsupported */
>> @@ -1512,7 +1517,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  					nr_failed++;
>>  				}
>>  
>> -				nr_failed_pages += nr_pages;
>> +				stats.nr_failed_pages += nr_pages;
>>  				list_move_tail(&folio->lru, &ret_folios);
>>  				break;
>>  			case -ENOMEM:
>> @@ -1522,13 +1527,13 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  				 */
>>  				if (is_large) {
>>  					nr_large_failed++;
>> -					nr_thp_failed += is_thp;
>> +					stats.nr_thp_failed += is_thp;
>>  					/* Large folio NUMA faulting doesn't split to retry. */
>>  					if (!nosplit) {
>>  						int ret = try_split_folio(folio, &split_folios);
>>  
>>  						if (!ret) {
>> -							nr_thp_split += is_thp;
>> +							stats.nr_thp_split += is_thp;
>>  							break;
>>  						} else if (reason == MR_LONGTERM_PIN &&
>>  							   ret == -EAGAIN) {
>> @@ -1546,7 +1551,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  					nr_failed++;
>>  				}
>>  
>> -				nr_failed_pages += nr_pages + nr_retry_pages;
>> +				stats.nr_failed_pages += nr_pages + nr_retry_pages;
>>  				/*
>>  				 * There might be some split folios of fail-to-migrate large
>>  				 * folios left in split_folios list. Move them back to migration
>> @@ -1556,7 +1561,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  				list_splice_init(&split_folios, from);
>>  				/* nr_failed isn't updated for not used */
>>  				nr_large_failed += large_retry;
>> -				nr_thp_failed += thp_retry;
>> +				stats.nr_thp_failed += thp_retry;
>>  				goto out;
>>  			case -EAGAIN:
>>  				if (is_large) {
>> @@ -1568,8 +1573,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  				nr_retry_pages += nr_pages;
>>  				break;
>>  			case MIGRATEPAGE_SUCCESS:
>> -				nr_succeeded += nr_pages;
>> -				nr_thp_succeeded += is_thp;
>> +				stats.nr_succeeded += nr_pages;
>> +				stats.nr_thp_succeeded += is_thp;
>>  				break;
>>  			default:
>>  				/*
>> @@ -1580,20 +1585,20 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  				 */
>>  				if (is_large) {
>>  					nr_large_failed++;
>> -					nr_thp_failed += is_thp;
>> +					stats.nr_thp_failed += is_thp;
>>  				} else if (!no_split_folio_counting) {
>>  					nr_failed++;
>>  				}
>>  
>> -				nr_failed_pages += nr_pages;
>> +				stats.nr_failed_pages += nr_pages;
>>  				break;
>>  			}
>>  		}
>>  	}
>>  	nr_failed += retry;
>>  	nr_large_failed += large_retry;
>> -	nr_thp_failed += thp_retry;
>> -	nr_failed_pages += nr_retry_pages;
>> +	stats.nr_thp_failed += thp_retry;
>> +	stats.nr_failed_pages += nr_retry_pages;
>>  	/*
>>  	 * Try to migrate split folios of fail-to-migrate large folios, no
>>  	 * nr_failed counting in this round, since all split folios of a
>> @@ -1626,16 +1631,17 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  	if (list_empty(from))
>>  		rc = 0;
>>  
>> -	count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
>> -	count_vm_events(PGMIGRATE_FAIL, nr_failed_pages);
>> -	count_vm_events(THP_MIGRATION_SUCCESS, nr_thp_succeeded);
>> -	count_vm_events(THP_MIGRATION_FAIL, nr_thp_failed);
>> -	count_vm_events(THP_MIGRATION_SPLIT, nr_thp_split);
>> -	trace_mm_migrate_pages(nr_succeeded, nr_failed_pages, nr_thp_succeeded,
>> -			       nr_thp_failed, nr_thp_split, mode, reason);
>> +	count_vm_events(PGMIGRATE_SUCCESS, stats.nr_succeeded);
>> +	count_vm_events(PGMIGRATE_FAIL, stats.nr_failed_pages);
>> +	count_vm_events(THP_MIGRATION_SUCCESS, stats.nr_thp_succeeded);
>> +	count_vm_events(THP_MIGRATION_FAIL, stats.nr_thp_failed);
>> +	count_vm_events(THP_MIGRATION_SPLIT, stats.nr_thp_split);
>> +	trace_mm_migrate_pages(stats.nr_succeeded, stats.nr_failed_pages,
>> +			       stats.nr_thp_succeeded, stats.nr_thp_failed,
>> +			       stats.nr_thp_split, mode, reason);
>>  
>>  	if (ret_succeeded)
>> -		*ret_succeeded = nr_succeeded;
>> +		*ret_succeeded = stats.nr_succeeded;
>>  
>>  	return rc;
>>  }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/8] migrate_pages: separate hugetlb folios migration
  2023-01-05  5:51     ` Huang, Ying
@ 2023-01-05  6:43       ` Alistair Popple
  2023-01-05  7:31         ` Huang, Ying
  0 siblings, 1 reply; 39+ messages in thread
From: Alistair Popple @ 2023-01-05  6:43 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	haoxin


"Huang, Ying" <ying.huang@intel.com> writes:

> Alistair Popple <apopple@nvidia.com> writes:
>
>> Huang Ying <ying.huang@intel.com> writes:
>>
>>> This is a preparation patch to batch the folio unmapping and moving
>>> for the non-hugetlb folios.  Based on that we can batch the TLB
>>> shootdown during the folio migration and make it possible to use some
>>> hardware accelerator for the folio copying.
>>>
>>> In this patch the hugetlb folios and non-hugetlb folios migration is
>>> separated in migrate_pages() to make it easy to change the non-hugetlb
>>> folios migration implementation.
>>>
>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Cc: Yang Shi <shy828301@gmail.com>
>>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Cc: Oscar Salvador <osalvador@suse.de>
>>> Cc: Matthew Wilcox <willy@infradead.org>
>>> Cc: Bharata B Rao <bharata@amd.com>
>>> Cc: Alistair Popple <apopple@nvidia.com>
>>> Cc: haoxin <xhao@linux.alibaba.com>
>>> ---
>>>  mm/migrate.c | 114 ++++++++++++++++++++++++++++++++++++++++++++-------
>>>  1 file changed, 99 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index ec9263a33d38..bdbe73fe2eb7 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1404,6 +1404,87 @@ struct migrate_pages_stats {
>>>  	int nr_thp_split;
>>>  };
>>>  
>>> +static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
>>> +			    free_page_t put_new_page, unsigned long private,
>>> +			    enum migrate_mode mode, int reason,
>>> +			    struct migrate_pages_stats *stats,
>>> +			    struct list_head *ret_folios)
>>> +{
>>> +	int retry = 1;
>>> +	int nr_failed = 0;
>>> +	int nr_retry_pages = 0;
>>> +	int pass = 0;
>>> +	struct folio *folio, *folio2;
>>> +	int rc = 0, nr_pages;
>>> +
>>> +	for (pass = 0; pass < 10 && retry; pass++) {
>>> +		retry = 0;
>>> +		nr_retry_pages = 0;
>>> +
>>> +		list_for_each_entry_safe(folio, folio2, from, lru) {
>>> +			if (!folio_test_hugetlb(folio))
>>> +				continue;
>>> +
>>> +			nr_pages = folio_nr_pages(folio);
>>> +
>>> +			cond_resched();
>>> +
>>> +			rc = unmap_and_move_huge_page(get_new_page,
>>> +						      put_new_page, private,
>>> +						      &folio->page, pass > 2, mode,
>>> +						      reason, ret_folios);
>>> +			/*
>>> +			 * The rules are:
>>> +			 *	Success: hugetlb folio will be put back
>>> +			 *	-EAGAIN: stay on the from list
>>> +			 *	-ENOMEM: stay on the from list
>>> +			 *	-ENOSYS: stay on the from list
>>> +			 *	Other errno: put on ret_folios list
>>> +			 */
>>> +			switch(rc) {
>>> +			case -ENOSYS:
>>> +				/* Hugetlb migration is unsupported */
>>> +				nr_failed++;
>>> +				stats->nr_failed_pages += nr_pages;
>>> +				list_move_tail(&folio->lru, ret_folios);
>>> +				break;
>>> +			case -ENOMEM:
>>> +				/*
>>> +				 * When memory is low, don't bother to try to migrate
>>> +				 * other folios, just exit.
>>> +				 */
>>> +				nr_failed++;
>>
>> This currently isn't relevant for -ENOMEM and I think it would be
>> clearer if it was dropped.
>
> OK.
>
>>> +				stats->nr_failed_pages += nr_pages;
>>
>> Makes sense not to continue migration with low memory, but shouldn't we
>> add the remaining unmigrated hugetlb folios to stats->nr_failed_pages as
>> well? Ie. don't we still have to continue the iteration to to find and
>> account for these?
>
> I think nr_failed_pages only counts tried pages.  IIUC, it's the
> original behavior and behavior for non-hugetlb pages too.

Hmm, I agree it seems this is the original behavior but that behaviour
seems arbitrary and wrong IMHO. The page failed to migrate, therefore it
should count as such. The fact we didn't even try seems irrelevant.

Indeed it looks like this was introduced because it was confusing to see
no failures even though migrate_pages() was called - see dfef2ef4027b
("mm, migrate: increment fail count on ENOMEM").

But that seems inconsistent - why count this one folio as failed because
of the allocation failure while other folios which would also likely
cause allocation failures don't get counted? Fixing it is probably
outside the scope of this series so I won't insist, but it would be nice
as it could still lead to confusion in some scenarios.

[...]

>>> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  		nr_retry_pages = 0;
>>>  
>>>  		list_for_each_entry_safe(folio, folio2, from, lru) {
>>> +			if (folio_test_hugetlb(folio)) {
>>
>> How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
>> any hugetlb folios off the from list?
>
> Retried hugetlb folios will be kept in from list.

Couldn't migrate_hugetlbs() remove the failing retried pages from the
list on the final pass? That seems cleaner to me.

>>> +				list_move_tail(&folio->lru, &ret_folios);
>>> +				continue;
>>> +			}
>>> +
>>>  			/*
>>>  			 * Large folio statistics is based on the source large
>>>  			 * folio. Capture required information that might get
>>>  			 * lost during migration.
>>>  			 */
>>> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
>>> +			is_large = folio_test_large(folio);
>>>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>>>  			nr_pages = folio_nr_pages(folio);
>>> +
>>>  			cond_resched();
>>>  
>>> -			if (folio_test_hugetlb(folio))
>>> -				rc = unmap_and_move_huge_page(get_new_page,
>>> -						put_new_page, private,
>>> -						&folio->page, pass > 2, mode,
>>> -						reason,
>>> -						&ret_folios);
>>> -			else
>>> -				rc = unmap_and_move(get_new_page, put_new_page,
>>> -						private, folio, pass > 2, mode,
>>> -						reason, &ret_folios);
>>> +			rc = unmap_and_move(get_new_page, put_new_page,
>>> +					    private, folio, pass > 2, mode,
>>> +					    reason, &ret_folios);
>>>  			/*
>>>  			 * The rules are:
>>> -			 *	Success: non hugetlb folio will be freed, hugetlb
>>> -			 *		 folio will be put back
>>> +			 *	Success: folio will be freed
>>>  			 *	-EAGAIN: stay on the from list
>>>  			 *	-ENOMEM: stay on the from list
>>>  			 *	-ENOSYS: stay on the from list
>>> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  						stats.nr_thp_split += is_thp;
>>>  						break;
>>>  					}
>>> -				/* Hugetlb migration is unsupported */
>>>  				} else if (!no_split_folio_counting) {
>>>  					nr_failed++;
>>>  				}
>
> Best Regards,
> Huang, Ying


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/8] migrate_pages: organize stats with struct migrate_pages_stats
  2023-01-05  5:53     ` Huang, Ying
@ 2023-01-05  6:50       ` Alistair Popple
  2023-01-05  7:06         ` Huang, Ying
  0 siblings, 1 reply; 39+ messages in thread
From: Alistair Popple @ 2023-01-05  6:50 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	haoxin


"Huang, Ying" <ying.huang@intel.com> writes:

> Alistair Popple <apopple@nvidia.com> writes:
>
>> Huang Ying <ying.huang@intel.com> writes:
>>
>>> Define struct migrate_pages_stats to organize the various statistics
>>> in migrate_pages().  This makes it easier to collect and consume the
>>> statistics in multiple functions.  This will be needed in the
>>> following patches in the series.
>>>
>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Cc: Yang Shi <shy828301@gmail.com>
>>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Cc: Oscar Salvador <osalvador@suse.de>
>>> Cc: Matthew Wilcox <willy@infradead.org>
>>> Cc: Bharata B Rao <bharata@amd.com>
>>> Cc: Alistair Popple <apopple@nvidia.com>
>>> Cc: haoxin <xhao@linux.alibaba.com>
>>> ---
>>>  mm/migrate.c | 58 +++++++++++++++++++++++++++++-----------------------
>>>  1 file changed, 32 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index a4d3fc65085f..ec9263a33d38 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1396,6 +1396,14 @@ static inline int try_split_folio(struct folio *folio, struct list_head *split_f
>>>  	return rc;
>>>  }
>>>  
>>> +struct migrate_pages_stats {
>>> +	int nr_succeeded;
>>> +	int nr_failed_pages;
>>> +	int nr_thp_succeeded;
>>> +	int nr_thp_failed;
>>> +	int nr_thp_split;
>>
>> I think some brief comments in the code for what each stat is tracking
>> and their relationship to each other would be helpful (ie. does
>> nr_succeeded include thp subpages, etc). Or at least a reference to
>> where this is documented (ie. page_migration.rst) as I recall there has
>> been some confusion in the past that has lead to bugs.
>
> OK, will do that in the next version.

You should add that nr_failed_pages doesn't count failures of migrations
that weren't attempted because eg. allocation failure as that was a
surprising detail to me at least. Unless of course you decide to fix
that :-)

>> Otherwise the patch looks good so:
>>
>> Reviewed-by: Alistair Popple <apopple@nvidia.com>
>
> Thanks!
>
> Best Regards,
> Huang, Ying
>
>>> +};
>>> +
>>>  /*
>>>   * migrate_pages - migrate the folios specified in a list, to the free folios
>>>   *		   supplied as the target for the page migration
>>> @@ -1430,13 +1438,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  	int large_retry = 1;
>>>  	int thp_retry = 1;
>>>  	int nr_failed = 0;
>>> -	int nr_failed_pages = 0;
>>>  	int nr_retry_pages = 0;
>>> -	int nr_succeeded = 0;
>>> -	int nr_thp_succeeded = 0;
>>>  	int nr_large_failed = 0;
>>> -	int nr_thp_failed = 0;
>>> -	int nr_thp_split = 0;
>>>  	int pass = 0;
>>>  	bool is_large = false;
>>>  	bool is_thp = false;
>>> @@ -1446,9 +1449,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  	LIST_HEAD(split_folios);
>>>  	bool nosplit = (reason == MR_NUMA_MISPLACED);
>>>  	bool no_split_folio_counting = false;
>>> +	struct migrate_pages_stats stats;
>>>  
>>>  	trace_mm_migrate_pages_start(mode, reason);
>>>  
>>> +	memset(&stats, 0, sizeof(stats));
>>>  split_folio_migration:
>>>  	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
>>>  		retry = 0;
>>> @@ -1502,9 +1507,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  				/* Large folio migration is unsupported */
>>>  				if (is_large) {
>>>  					nr_large_failed++;
>>> -					nr_thp_failed += is_thp;
>>> +					stats.nr_thp_failed += is_thp;
>>>  					if (!try_split_folio(folio, &split_folios)) {
>>> -						nr_thp_split += is_thp;
>>> +						stats.nr_thp_split += is_thp;
>>>  						break;
>>>  					}
>>>  				/* Hugetlb migration is unsupported */
>>> @@ -1512,7 +1517,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  					nr_failed++;
>>>  				}
>>>  
>>> -				nr_failed_pages += nr_pages;
>>> +				stats.nr_failed_pages += nr_pages;
>>>  				list_move_tail(&folio->lru, &ret_folios);
>>>  				break;
>>>  			case -ENOMEM:
>>> @@ -1522,13 +1527,13 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  				 */
>>>  				if (is_large) {
>>>  					nr_large_failed++;
>>> -					nr_thp_failed += is_thp;
>>> +					stats.nr_thp_failed += is_thp;
>>>  					/* Large folio NUMA faulting doesn't split to retry. */
>>>  					if (!nosplit) {
>>>  						int ret = try_split_folio(folio, &split_folios);
>>>  
>>>  						if (!ret) {
>>> -							nr_thp_split += is_thp;
>>> +							stats.nr_thp_split += is_thp;
>>>  							break;
>>>  						} else if (reason == MR_LONGTERM_PIN &&
>>>  							   ret == -EAGAIN) {
>>> @@ -1546,7 +1551,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  					nr_failed++;
>>>  				}
>>>  
>>> -				nr_failed_pages += nr_pages + nr_retry_pages;
>>> +				stats.nr_failed_pages += nr_pages + nr_retry_pages;
>>>  				/*
>>>  				 * There might be some split folios of fail-to-migrate large
>>>  				 * folios left in split_folios list. Move them back to migration
>>> @@ -1556,7 +1561,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  				list_splice_init(&split_folios, from);
>>>  				/* nr_failed isn't updated for not used */
>>>  				nr_large_failed += large_retry;
>>> -				nr_thp_failed += thp_retry;
>>> +				stats.nr_thp_failed += thp_retry;
>>>  				goto out;
>>>  			case -EAGAIN:
>>>  				if (is_large) {
>>> @@ -1568,8 +1573,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  				nr_retry_pages += nr_pages;
>>>  				break;
>>>  			case MIGRATEPAGE_SUCCESS:
>>> -				nr_succeeded += nr_pages;
>>> -				nr_thp_succeeded += is_thp;
>>> +				stats.nr_succeeded += nr_pages;
>>> +				stats.nr_thp_succeeded += is_thp;
>>>  				break;
>>>  			default:
>>>  				/*
>>> @@ -1580,20 +1585,20 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  				 */
>>>  				if (is_large) {
>>>  					nr_large_failed++;
>>> -					nr_thp_failed += is_thp;
>>> +					stats.nr_thp_failed += is_thp;
>>>  				} else if (!no_split_folio_counting) {
>>>  					nr_failed++;
>>>  				}
>>>  
>>> -				nr_failed_pages += nr_pages;
>>> +				stats.nr_failed_pages += nr_pages;
>>>  				break;
>>>  			}
>>>  		}
>>>  	}
>>>  	nr_failed += retry;
>>>  	nr_large_failed += large_retry;
>>> -	nr_thp_failed += thp_retry;
>>> -	nr_failed_pages += nr_retry_pages;
>>> +	stats.nr_thp_failed += thp_retry;
>>> +	stats.nr_failed_pages += nr_retry_pages;
>>>  	/*
>>>  	 * Try to migrate split folios of fail-to-migrate large folios, no
>>>  	 * nr_failed counting in this round, since all split folios of a
>>> @@ -1626,16 +1631,17 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  	if (list_empty(from))
>>>  		rc = 0;
>>>  
>>> -	count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
>>> -	count_vm_events(PGMIGRATE_FAIL, nr_failed_pages);
>>> -	count_vm_events(THP_MIGRATION_SUCCESS, nr_thp_succeeded);
>>> -	count_vm_events(THP_MIGRATION_FAIL, nr_thp_failed);
>>> -	count_vm_events(THP_MIGRATION_SPLIT, nr_thp_split);
>>> -	trace_mm_migrate_pages(nr_succeeded, nr_failed_pages, nr_thp_succeeded,
>>> -			       nr_thp_failed, nr_thp_split, mode, reason);
>>> +	count_vm_events(PGMIGRATE_SUCCESS, stats.nr_succeeded);
>>> +	count_vm_events(PGMIGRATE_FAIL, stats.nr_failed_pages);
>>> +	count_vm_events(THP_MIGRATION_SUCCESS, stats.nr_thp_succeeded);
>>> +	count_vm_events(THP_MIGRATION_FAIL, stats.nr_thp_failed);
>>> +	count_vm_events(THP_MIGRATION_SPLIT, stats.nr_thp_split);
>>> +	trace_mm_migrate_pages(stats.nr_succeeded, stats.nr_failed_pages,
>>> +			       stats.nr_thp_succeeded, stats.nr_thp_failed,
>>> +			       stats.nr_thp_split, mode, reason);
>>>  
>>>  	if (ret_succeeded)
>>> -		*ret_succeeded = nr_succeeded;
>>> +		*ret_succeeded = stats.nr_succeeded;
>>>  
>>>  	return rc;
>>>  }


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/8] migrate_pages: organize stats with struct migrate_pages_stats
  2023-01-05  6:50       ` Alistair Popple
@ 2023-01-05  7:06         ` Huang, Ying
  0 siblings, 0 replies; 39+ messages in thread
From: Huang, Ying @ 2023-01-05  7:06 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Andrew Morton, linux-mm, linux-kernel, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	haoxin

Alistair Popple <apopple@nvidia.com> writes:

> "Huang, Ying" <ying.huang@intel.com> writes:
>
>> Alistair Popple <apopple@nvidia.com> writes:
>>
>>> Huang Ying <ying.huang@intel.com> writes:
>>>
>>>> Define struct migrate_pages_stats to organize the various statistics
>>>> in migrate_pages().  This makes it easier to collect and consume the
>>>> statistics in multiple functions.  This will be needed in the
>>>> following patches in the series.
>>>>
>>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>> Cc: Yang Shi <shy828301@gmail.com>
>>>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>> Cc: Matthew Wilcox <willy@infradead.org>
>>>> Cc: Bharata B Rao <bharata@amd.com>
>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>> Cc: haoxin <xhao@linux.alibaba.com>
>>>> ---
>>>>  mm/migrate.c | 58 +++++++++++++++++++++++++++++-----------------------
>>>>  1 file changed, 32 insertions(+), 26 deletions(-)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index a4d3fc65085f..ec9263a33d38 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1396,6 +1396,14 @@ static inline int try_split_folio(struct folio *folio, struct list_head *split_f
>>>>  	return rc;
>>>>  }
>>>>  
>>>> +struct migrate_pages_stats {
>>>> +	int nr_succeeded;
>>>> +	int nr_failed_pages;
>>>> +	int nr_thp_succeeded;
>>>> +	int nr_thp_failed;
>>>> +	int nr_thp_split;
>>>
>>> I think some brief comments in the code for what each stat is tracking
>>> and their relationship to each other would be helpful (ie. does
>>> nr_succeeded include thp subpages, etc). Or at least a reference to
>>> where this is documented (ie. page_migration.rst) as I recall there has
>>> been some confusion in the past that has lead to bugs.
>>
>> OK, will do that in the next version.
>
> You should add that nr_failed_pages doesn't count failures of migrations
> that weren't attempted because eg. allocation failure as that was a
> surprising detail to me at least. Unless of course you decide to fix
> that :-)

nr_failed_pages are used for /proc/vmstat.  Syscall move_pages() cares
about how many pages requested but not tried.  But the system wide
statistics doesn't care about it.  I think that is the appropriate.

Best Regards,
Huang, Ying

>>> Otherwise the patch looks good so:
>>>
>>> Reviewed-by: Alistair Popple <apopple@nvidia.com>
>>
>> Thanks!
>>
>> Best Regards,
>> Huang, Ying
>>
>>>> +};
>>>> +
>>>>  /*
>>>>   * migrate_pages - migrate the folios specified in a list, to the free folios
>>>>   *		   supplied as the target for the page migration
>>>> @@ -1430,13 +1438,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>  	int large_retry = 1;
>>>>  	int thp_retry = 1;
>>>>  	int nr_failed = 0;
>>>> -	int nr_failed_pages = 0;
>>>>  	int nr_retry_pages = 0;
>>>> -	int nr_succeeded = 0;
>>>> -	int nr_thp_succeeded = 0;
>>>>  	int nr_large_failed = 0;
>>>> -	int nr_thp_failed = 0;
>>>> -	int nr_thp_split = 0;
>>>>  	int pass = 0;
>>>>  	bool is_large = false;
>>>>  	bool is_thp = false;
>>>> @@ -1446,9 +1449,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>  	LIST_HEAD(split_folios);
>>>>  	bool nosplit = (reason == MR_NUMA_MISPLACED);
>>>>  	bool no_split_folio_counting = false;
>>>> +	struct migrate_pages_stats stats;
>>>>  
>>>>  	trace_mm_migrate_pages_start(mode, reason);
>>>>  
>>>> +	memset(&stats, 0, sizeof(stats));
>>>>  split_folio_migration:
>>>>  	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
>>>>  		retry = 0;
>>>> @@ -1502,9 +1507,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>  				/* Large folio migration is unsupported */
>>>>  				if (is_large) {
>>>>  					nr_large_failed++;
>>>> -					nr_thp_failed += is_thp;
>>>> +					stats.nr_thp_failed += is_thp;
>>>>  					if (!try_split_folio(folio, &split_folios)) {
>>>> -						nr_thp_split += is_thp;
>>>> +						stats.nr_thp_split += is_thp;
>>>>  						break;
>>>>  					}
>>>>  				/* Hugetlb migration is unsupported */
>>>> @@ -1512,7 +1517,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>  					nr_failed++;
>>>>  				}
>>>>  
>>>> -				nr_failed_pages += nr_pages;
>>>> +				stats.nr_failed_pages += nr_pages;
>>>>  				list_move_tail(&folio->lru, &ret_folios);
>>>>  				break;
>>>>  			case -ENOMEM:
>>>> @@ -1522,13 +1527,13 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>  				 */
>>>>  				if (is_large) {
>>>>  					nr_large_failed++;
>>>> -					nr_thp_failed += is_thp;
>>>> +					stats.nr_thp_failed += is_thp;
>>>>  					/* Large folio NUMA faulting doesn't split to retry. */
>>>>  					if (!nosplit) {
>>>>  						int ret = try_split_folio(folio, &split_folios);
>>>>  
>>>>  						if (!ret) {
>>>> -							nr_thp_split += is_thp;
>>>> +							stats.nr_thp_split += is_thp;
>>>>  							break;
>>>>  						} else if (reason == MR_LONGTERM_PIN &&
>>>>  							   ret == -EAGAIN) {
>>>> @@ -1546,7 +1551,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>  					nr_failed++;
>>>>  				}
>>>>  
>>>> -				nr_failed_pages += nr_pages + nr_retry_pages;
>>>> +				stats.nr_failed_pages += nr_pages + nr_retry_pages;
>>>>  				/*
>>>>  				 * There might be some split folios of fail-to-migrate large
>>>>  				 * folios left in split_folios list. Move them back to migration
>>>> @@ -1556,7 +1561,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>  				list_splice_init(&split_folios, from);
>>>>  				/* nr_failed isn't updated for not used */
>>>>  				nr_large_failed += large_retry;
>>>> -				nr_thp_failed += thp_retry;
>>>> +				stats.nr_thp_failed += thp_retry;
>>>>  				goto out;
>>>>  			case -EAGAIN:
>>>>  				if (is_large) {
>>>> @@ -1568,8 +1573,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>  				nr_retry_pages += nr_pages;
>>>>  				break;
>>>>  			case MIGRATEPAGE_SUCCESS:
>>>> -				nr_succeeded += nr_pages;
>>>> -				nr_thp_succeeded += is_thp;
>>>> +				stats.nr_succeeded += nr_pages;
>>>> +				stats.nr_thp_succeeded += is_thp;
>>>>  				break;
>>>>  			default:
>>>>  				/*
>>>> @@ -1580,20 +1585,20 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>  				 */
>>>>  				if (is_large) {
>>>>  					nr_large_failed++;
>>>> -					nr_thp_failed += is_thp;
>>>> +					stats.nr_thp_failed += is_thp;
>>>>  				} else if (!no_split_folio_counting) {
>>>>  					nr_failed++;
>>>>  				}
>>>>  
>>>> -				nr_failed_pages += nr_pages;
>>>> +				stats.nr_failed_pages += nr_pages;
>>>>  				break;
>>>>  			}
>>>>  		}
>>>>  	}
>>>>  	nr_failed += retry;
>>>>  	nr_large_failed += large_retry;
>>>> -	nr_thp_failed += thp_retry;
>>>> -	nr_failed_pages += nr_retry_pages;
>>>> +	stats.nr_thp_failed += thp_retry;
>>>> +	stats.nr_failed_pages += nr_retry_pages;
>>>>  	/*
>>>>  	 * Try to migrate split folios of fail-to-migrate large folios, no
>>>>  	 * nr_failed counting in this round, since all split folios of a
>>>> @@ -1626,16 +1631,17 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>  	if (list_empty(from))
>>>>  		rc = 0;
>>>>  
>>>> -	count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
>>>> -	count_vm_events(PGMIGRATE_FAIL, nr_failed_pages);
>>>> -	count_vm_events(THP_MIGRATION_SUCCESS, nr_thp_succeeded);
>>>> -	count_vm_events(THP_MIGRATION_FAIL, nr_thp_failed);
>>>> -	count_vm_events(THP_MIGRATION_SPLIT, nr_thp_split);
>>>> -	trace_mm_migrate_pages(nr_succeeded, nr_failed_pages, nr_thp_succeeded,
>>>> -			       nr_thp_failed, nr_thp_split, mode, reason);
>>>> +	count_vm_events(PGMIGRATE_SUCCESS, stats.nr_succeeded);
>>>> +	count_vm_events(PGMIGRATE_FAIL, stats.nr_failed_pages);
>>>> +	count_vm_events(THP_MIGRATION_SUCCESS, stats.nr_thp_succeeded);
>>>> +	count_vm_events(THP_MIGRATION_FAIL, stats.nr_thp_failed);
>>>> +	count_vm_events(THP_MIGRATION_SPLIT, stats.nr_thp_split);
>>>> +	trace_mm_migrate_pages(stats.nr_succeeded, stats.nr_failed_pages,
>>>> +			       stats.nr_thp_succeeded, stats.nr_thp_failed,
>>>> +			       stats.nr_thp_split, mode, reason);
>>>>  
>>>>  	if (ret_succeeded)
>>>> -		*ret_succeeded = nr_succeeded;
>>>> +		*ret_succeeded = stats.nr_succeeded;
>>>>  
>>>>  	return rc;
>>>>  }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/8] migrate_pages: separate hugetlb folios migration
  2023-01-05  6:43       ` Alistair Popple
@ 2023-01-05  7:31         ` Huang, Ying
  2023-01-05  7:39           ` Alistair Popple
  0 siblings, 1 reply; 39+ messages in thread
From: Huang, Ying @ 2023-01-05  7:31 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Andrew Morton, linux-mm, linux-kernel, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	haoxin


[snip]

>
>>>> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>  		nr_retry_pages = 0;
>>>>  
>>>>  		list_for_each_entry_safe(folio, folio2, from, lru) {
>>>> +			if (folio_test_hugetlb(folio)) {
>>>
>>> How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
>>> any hugetlb folios off the from list?
>>
>> Retried hugetlb folios will be kept in from list.
>
> Couldn't migrate_hugetlbs() remove the failing retried pages from the
> list on the final pass? That seems cleaner to me.

To do that, we need to go through the folio list again to remove all
hugetlb pages.  It could be time-consuming in some cases.  So I think
that it's better to keep this.

Best Regards,
Huang, Ying

>>>> +				list_move_tail(&folio->lru, &ret_folios);
>>>> +				continue;
>>>> +			}
>>>> +
>>>>  			/*
>>>>  			 * Large folio statistics is based on the source large
>>>>  			 * folio. Capture required information that might get
>>>>  			 * lost during migration.
>>>>  			 */
>>>> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
>>>> +			is_large = folio_test_large(folio);
>>>>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>>>>  			nr_pages = folio_nr_pages(folio);
>>>> +
>>>>  			cond_resched();
>>>>  
>>>> -			if (folio_test_hugetlb(folio))
>>>> -				rc = unmap_and_move_huge_page(get_new_page,
>>>> -						put_new_page, private,
>>>> -						&folio->page, pass > 2, mode,
>>>> -						reason,
>>>> -						&ret_folios);
>>>> -			else
>>>> -				rc = unmap_and_move(get_new_page, put_new_page,
>>>> -						private, folio, pass > 2, mode,
>>>> -						reason, &ret_folios);
>>>> +			rc = unmap_and_move(get_new_page, put_new_page,
>>>> +					    private, folio, pass > 2, mode,
>>>> +					    reason, &ret_folios);
>>>>  			/*
>>>>  			 * The rules are:
>>>> -			 *	Success: non hugetlb folio will be freed, hugetlb
>>>> -			 *		 folio will be put back
>>>> +			 *	Success: folio will be freed
>>>>  			 *	-EAGAIN: stay on the from list
>>>>  			 *	-ENOMEM: stay on the from list
>>>>  			 *	-ENOSYS: stay on the from list
>>>> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>  						stats.nr_thp_split += is_thp;
>>>>  						break;
>>>>  					}
>>>> -				/* Hugetlb migration is unsupported */
>>>>  				} else if (!no_split_folio_counting) {
>>>>  					nr_failed++;
>>>>  				}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/8] migrate_pages: separate hugetlb folios migration
  2023-01-05  7:31         ` Huang, Ying
@ 2023-01-05  7:39           ` Alistair Popple
  2023-01-09  7:23             ` Huang, Ying
  0 siblings, 1 reply; 39+ messages in thread
From: Alistair Popple @ 2023-01-05  7:39 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	haoxin


"Huang, Ying" <ying.huang@intel.com> writes:

> [snip]
>
>>
>>>>> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>>  		nr_retry_pages = 0;
>>>>>  
>>>>>  		list_for_each_entry_safe(folio, folio2, from, lru) {
>>>>> +			if (folio_test_hugetlb(folio)) {
>>>>
>>>> How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
>>>> any hugetlb folios off the from list?
>>>
>>> Retried hugetlb folios will be kept in from list.
>>
>> Couldn't migrate_hugetlbs() remove the failing retried pages from the
>> list on the final pass? That seems cleaner to me.
>
> To do that, we need to go through the folio list again to remove all
> hugetlb pages.  It could be time-consuming in some cases.  So I think
> that it's better to keep this.

Why? Couldn't we test pass == 9 and remove it from the list if it fails
the final retry in migrate_hugetlbs()? In any case if it's on the list
due to failed retries we have already passed over it 10 times, so the
extra loop hardly seems like a problem.

 - Alistair

> Best Regards,
> Huang, Ying
>
>>>>> +				list_move_tail(&folio->lru, &ret_folios);
>>>>> +				continue;
>>>>> +			}
>>>>> +
>>>>>  			/*
>>>>>  			 * Large folio statistics is based on the source large
>>>>>  			 * folio. Capture required information that might get
>>>>>  			 * lost during migration.
>>>>>  			 */
>>>>> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
>>>>> +			is_large = folio_test_large(folio);
>>>>>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>>>>>  			nr_pages = folio_nr_pages(folio);
>>>>> +
>>>>>  			cond_resched();
>>>>>  
>>>>> -			if (folio_test_hugetlb(folio))
>>>>> -				rc = unmap_and_move_huge_page(get_new_page,
>>>>> -						put_new_page, private,
>>>>> -						&folio->page, pass > 2, mode,
>>>>> -						reason,
>>>>> -						&ret_folios);
>>>>> -			else
>>>>> -				rc = unmap_and_move(get_new_page, put_new_page,
>>>>> -						private, folio, pass > 2, mode,
>>>>> -						reason, &ret_folios);
>>>>> +			rc = unmap_and_move(get_new_page, put_new_page,
>>>>> +					    private, folio, pass > 2, mode,
>>>>> +					    reason, &ret_folios);
>>>>>  			/*
>>>>>  			 * The rules are:
>>>>> -			 *	Success: non hugetlb folio will be freed, hugetlb
>>>>> -			 *		 folio will be put back
>>>>> +			 *	Success: folio will be freed
>>>>>  			 *	-EAGAIN: stay on the from list
>>>>>  			 *	-ENOMEM: stay on the from list
>>>>>  			 *	-ENOSYS: stay on the from list
>>>>> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>>  						stats.nr_thp_split += is_thp;
>>>>>  						break;
>>>>>  					}
>>>>> -				/* Hugetlb migration is unsupported */
>>>>>  				} else if (!no_split_folio_counting) {
>>>>>  					nr_failed++;
>>>>>  				}


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/8] migrate_pages: split unmap_and_move() to _unmap() and _move()
  2022-12-27  0:28 ` [PATCH 4/8] migrate_pages: split unmap_and_move() to _unmap() and _move() Huang Ying
  2023-01-03 18:55   ` Zi Yan
@ 2023-01-05 18:26   ` Nathan Chancellor
  2023-01-05 18:57     ` Kees Cook
  1 sibling, 1 reply; 39+ messages in thread
From: Nathan Chancellor @ 2023-01-05 18:26 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	Alistair Popple, haoxin, Kees Cook

Hi Ying,

On Tue, Dec 27, 2022 at 08:28:55AM +0800, Huang Ying wrote:
> This is a preparation patch to batch the folio unmapping and moving.
> 
> In this patch, unmap_and_move() is split to migrate_folio_unmap() and
> migrate_folio_move().  So, we can batch _unmap() and _move() in
> different loops later.  To pass some information between unmap and
> move, the original unused dst->mapping and dst->private are used.
> 
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Bharata B Rao <bharata@amd.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: haoxin <xhao@linux.alibaba.com>
> ---
>  include/linux/migrate.h |   1 +
>  mm/migrate.c            | 162 +++++++++++++++++++++++++++++-----------
>  2 files changed, 121 insertions(+), 42 deletions(-)
> 
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 3ef77f52a4f0..7376074f2e1e 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -18,6 +18,7 @@ struct migration_target_control;
>   * - zero on page migration success;
>   */
>  #define MIGRATEPAGE_SUCCESS		0
> +#define MIGRATEPAGE_UNMAP		1
>  
>  /**
>   * struct movable_operations - Driver page migration
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 97ea0737ab2b..e2383b430932 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1009,11 +1009,29 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
>  	return rc;
>  }
>  
> -static int __unmap_and_move(struct folio *src, struct folio *dst,
> +static void __migrate_folio_record(struct folio *dst,
> +				   unsigned long page_was_mapped,
> +				   struct anon_vma *anon_vma)
> +{
> +	dst->mapping = (struct address_space *)anon_vma;
> +	dst->private = (void *)page_was_mapped;
> +}
> +
> +static void __migrate_folio_extract(struct folio *dst,
> +				   int *page_was_mappedp,
> +				   struct anon_vma **anon_vmap)
> +{
> +	*anon_vmap = (struct anon_vma *)dst->mapping;
> +	*page_was_mappedp = (unsigned long)dst->private;
> +	dst->mapping = NULL;
> +	dst->private = NULL;
> +}

This patch as commit 42871c600cad ("migrate_pages: split
unmap_and_move() to _unmap() and _move()") in next-20230105 causes the
following error with clang when CONFIG_RANDSTRUCT is enabled, which is
the case with allmodconfig:

  ../mm/migrate.c:1041:15: error: casting from randomized structure pointer type 'struct address_space *' to 'struct anon_vma *'
          *anon_vmap = (struct anon_vma *)dst->mapping;
                       ^
  1 error generated.

With GCC, there is only a note:

  ../mm/migrate.c: In function '__migrate_folio_extract':
  ../mm/migrate.c:1041:20: note: randstruct: casting between randomized structure pointer types (ssa): 'struct anon_vma' and 'struct address_space'

   1041 |         *anon_vmap = (struct anon_vma *)dst->mapping;
        |         ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Kees has done fixes for warnings and errors like this in the past (I
just ran

  $ git log -p --grep='randomized structure pointer type'

to find them) but I did not see any that would seem appropriate here
hence just the report :)

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/8] migrate_pages: split unmap_and_move() to _unmap() and _move()
  2023-01-05 18:26   ` Nathan Chancellor
@ 2023-01-05 18:57     ` Kees Cook
  2023-01-08 23:33       ` Huang, Ying
  0 siblings, 1 reply; 39+ messages in thread
From: Kees Cook @ 2023-01-05 18:57 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Huang Ying, Andrew Morton, linux-mm, linux-kernel, Zi Yan,
	Yang Shi, Baolin Wang, Oscar Salvador, Matthew Wilcox,
	Bharata B Rao, Alistair Popple, haoxin

On Thu, Jan 05, 2023 at 11:26:55AM -0700, Nathan Chancellor wrote:
> Hi Ying,
> 
> On Tue, Dec 27, 2022 at 08:28:55AM +0800, Huang Ying wrote:
> > This is a preparation patch to batch the folio unmapping and moving.
> > 
> > In this patch, unmap_and_move() is split to migrate_folio_unmap() and
> > migrate_folio_move().  So, we can batch _unmap() and _move() in
> > different loops later.  To pass some information between unmap and
> > move, the original unused dst->mapping and dst->private are used.
> > 
> > Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> > Cc: Zi Yan <ziy@nvidia.com>
> > Cc: Yang Shi <shy828301@gmail.com>
> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> > Cc: Oscar Salvador <osalvador@suse.de>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: Bharata B Rao <bharata@amd.com>
> > Cc: Alistair Popple <apopple@nvidia.com>
> > Cc: haoxin <xhao@linux.alibaba.com>
> > ---
> >  include/linux/migrate.h |   1 +
> >  mm/migrate.c            | 162 +++++++++++++++++++++++++++++-----------
> >  2 files changed, 121 insertions(+), 42 deletions(-)
> > 
> > diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> > index 3ef77f52a4f0..7376074f2e1e 100644
> > --- a/include/linux/migrate.h
> > +++ b/include/linux/migrate.h
> > @@ -18,6 +18,7 @@ struct migration_target_control;
> >   * - zero on page migration success;
> >   */
> >  #define MIGRATEPAGE_SUCCESS		0
> > +#define MIGRATEPAGE_UNMAP		1
> >  
> >  /**
> >   * struct movable_operations - Driver page migration
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 97ea0737ab2b..e2383b430932 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1009,11 +1009,29 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
> >  	return rc;
> >  }
> >  
> > -static int __unmap_and_move(struct folio *src, struct folio *dst,
> > +static void __migrate_folio_record(struct folio *dst,
> > +				   unsigned long page_was_mapped,
> > +				   struct anon_vma *anon_vma)
> > +{
> > +	dst->mapping = (struct address_space *)anon_vma;
> > +	dst->private = (void *)page_was_mapped;
> > +}
> > +
> > +static void __migrate_folio_extract(struct folio *dst,
> > +				   int *page_was_mappedp,
> > +				   struct anon_vma **anon_vmap)
> > +{
> > +	*anon_vmap = (struct anon_vma *)dst->mapping;
> > +	*page_was_mappedp = (unsigned long)dst->private;
> > +	dst->mapping = NULL;
> > +	dst->private = NULL;
> > +}
> 
> This patch as commit 42871c600cad ("migrate_pages: split
> unmap_and_move() to _unmap() and _move()") in next-20230105 causes the
> following error with clang when CONFIG_RANDSTRUCT is enabled, which is
> the case with allmodconfig:
> 
>   ../mm/migrate.c:1041:15: error: casting from randomized structure pointer type 'struct address_space *' to 'struct anon_vma *'
>           *anon_vmap = (struct anon_vma *)dst->mapping;
>                        ^
>   1 error generated.
> 
> With GCC, there is only a note:
> 
>   ../mm/migrate.c: In function '__migrate_folio_extract':
>   ../mm/migrate.c:1041:20: note: randstruct: casting between randomized structure pointer types (ssa): 'struct anon_vma' and 'struct address_space'
> 
>    1041 |         *anon_vmap = (struct anon_vma *)dst->mapping;
>         |         ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Kees has done fixes for warnings and errors like this in the past (I
> just ran
> 
>   $ git log -p --grep='randomized structure pointer type'
> 
> to find them) but I did not see any that would seem appropriate here
> hence just the report :)

If this struct is literally just a scratch space and the original struct
layout doesn't matter, it may be possible to silence this cast by using
"(void *)" instead of the explicit struct type pointer.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/8] migrate_pages: share more code between _unmap and _move
  2023-01-04  7:12   ` Alistair Popple
@ 2023-01-06  4:15     ` Huang, Ying
  0 siblings, 0 replies; 39+ messages in thread
From: Huang, Ying @ 2023-01-06  4:15 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Andrew Morton, linux-mm, linux-kernel, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	haoxin

Alistair Popple <apopple@nvidia.com> writes:

> Huang Ying <ying.huang@intel.com> writes:
>
>> This is a code cleanup patch to reduce the duplicated code between the
>> _unmap and _move stages of migrate_pages().  No functionality change
>> is expected.
>>
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Yang Shi <shy828301@gmail.com>
>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Bharata B Rao <bharata@amd.com>
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: haoxin <xhao@linux.alibaba.com>
>> ---
>>  mm/migrate.c | 208 ++++++++++++++++++++-------------------------------
>>  1 file changed, 82 insertions(+), 126 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 70b987391296..70a40b8fee1f 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1030,21 +1030,26 @@ static void __migrate_folio_extract(struct folio *dst,
>>  static void migrate_folio_undo_src(struct folio *src,
>>  				   int page_was_mapped,
>>  				   struct anon_vma *anon_vma,
>> +				   bool locked,
>>  				   struct list_head *ret)
>>  {
>>  	if (page_was_mapped)
>>  		remove_migration_ptes(src, src, false);
>>  	if (anon_vma)
>>  		put_anon_vma(anon_vma);
>> -	folio_unlock(src);
>> -	list_move_tail(&src->lru, ret);
>> +	if (locked)
>> +		folio_unlock(src);
>> +	if (ret)
>> +		list_move_tail(&src->lru, ret);
>>  }
>>  
>>  static void migrate_folio_undo_dst(struct folio *dst,
>> +				   bool locked,
>>  				   free_page_t put_new_page,
>>  				   unsigned long private)
>>  {
>> -	folio_unlock(dst);
>> +	if (locked)
>> +		folio_unlock(dst);
>>  	if (put_new_page)
>>  		put_new_page(&dst->page, private);
>>  	else
>> @@ -1068,14 +1073,44 @@ static void migrate_folio_done(struct folio *src,
>>  		folio_put(src);
>>  }
>>  
>> -static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>> -				 int force, bool force_lock, enum migrate_mode mode)
>> +/* Obtain the lock on page, remove all ptes. */
>> +static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
>> +			       unsigned long private, struct folio *src,
>> +			       struct folio **dstp, int force, bool force_lock,
>> +			       enum migrate_mode mode, enum migrate_reason reason,
>> +			       struct list_head *ret)
>
> Overall I think this should be refactored into some smaller, simpler
> functions as the error handling and the giant switch statement in
> migrate_pages_batch() is making my head hurt :-)

Yes.  Well-defined smaller functions are good.

From another point of view, code cleanup isn't the main purpose of the
patchset.  We can try to clean up the code after this patchset too.  And
we may have some new idea at that time.

>>  {
>> -	int rc = -EAGAIN;
>> +	struct folio *dst;
>> +	int rc = MIGRATEPAGE_UNMAP;
>> +	struct page *newpage = NULL;
>>  	int page_was_mapped = 0;
>>  	struct anon_vma *anon_vma = NULL;
>>  	bool is_lru = !__PageMovable(&src->page);
>> +	bool locked = false;
>> +	bool dst_locked = false;
>> +
>> +	if (!thp_migration_supported() && folio_test_transhuge(src))
>> +		return -ENOSYS;
>
> This would be easier to follow if it was just moved to the caller and
> the -ENOSYS switch case removed.

Good idea!  I tried this, it can reduce the line number in
migrate_pages_batch().

>> +	if (folio_ref_count(src) == 1) {
>> +		/* Folio was freed from under us. So we are done. */
>> +		folio_clear_active(src);
>> +		folio_clear_unevictable(src);
>> +		/* free_pages_prepare() will clear PG_isolated. */
>> +		list_del(&src->lru);
>> +		migrate_folio_done(src, reason);
>> +		return MIGRATEPAGE_SUCCESS;
>> +	}
>
> This is the only case that returns MIGRATEPAGE_SUCCESS so would also be
> clearer if moved to the caller eliminating another switch case.

This can reduce the line number of migrate_folio_unmap(), but it will
increase the line number of migrate_pages_batch(), which is a long
function already.  I am hesitated to make it even longer.

>> +
>> +	newpage = get_new_page(&src->page, private);
>> +	if (!newpage)
>> +		return -ENOMEM;
>> +	dst = page_folio(newpage);
>> +	*dstp = dst;
>> +
>> +	dst->private = NULL;
>
> This could be moved until after the folio_test_writeback(), which might
> make the split I suggest below easier.
>
>>  
>> +	rc = -EAGAIN;
>
> We can just initialise rc to -EAGAIN.

OK.

>>  	if (!folio_trylock(src)) {
>>  		if (!force || mode == MIGRATE_ASYNC)
>>  			goto out;
>> @@ -1103,6 +1138,7 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>>  
>>  		folio_lock(src);
>>  	}
>> +	locked = true;
>
> A seperate helper for locking the folio would be better IMHO.
>
>>  
>>  	if (folio_test_writeback(src)) {
>>  		/*
>> @@ -1117,10 +1153,10 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>>  			break;
>>  		default:
>>  			rc = -EBUSY;
>> -			goto out_unlock;
>> +			goto out;
>>  		}
>>  		if (!force)
>> -			goto out_unlock;
>> +			goto out;
>>  		folio_wait_writeback(src);
>>  	}
>
> This is the only path that return -EBUSY, so could be integrated into
> the helper suggested above for locking the folio.
>   
>> @@ -1150,7 +1186,8 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>>  	 * This is much like races on refcount of oldpage: just don't BUG().
>>  	 */
>>  	if (unlikely(!folio_trylock(dst)))
>> -		goto out_unlock;
>> +		goto out;
>> +	dst_locked = true;
>
> So how about splitting migrate_folio_unmap() into two functions:
>
> /*
>  * Prepare a folio for migration by locking the source, ensuring
>  * writeback is complete and allocating and locking a new destination
>  * page.
>  */
> migrate_folio_prepare(new_page_t get_new_page, free_page_t put_new_page,
>                       unsigned long private, struct folio *src,
> 	              struct folio **dstp, int force, bool force_lock)
>
> migrate_folio_unmap(struct folio *src, struct folio *dst,
>                     enum migrate_mode mode, enum migrate_reason reason)
>
> Obviously we still have the various failure scenarios to deal with, but
> I think it would be more readable if these were limited to undoing the
> migrate_folio_prepare() step in the caller. I think the list
> manipulation would also be more obvious if left to the caller.

migrate_folio_prepare() doesn't looks like a good abstraction for me.
And I am hesitated to increase the line number of migrate_pages_batch()
further.

>>  	if (unlikely(!is_lru)) {
>>  		__migrate_folio_record(dst, page_was_mapped, anon_vma);
>> @@ -1172,7 +1209,7 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>>  	if (!src->mapping) {
>>  		if (folio_test_private(src)) {
>>  			try_to_free_buffers(src);
>> -			goto out_unlock_both;
>> +			goto out;
>>  		}
>>  	} else if (folio_mapped(src)) {
>>  		/* Establish migration ptes */
>> @@ -1187,75 +1224,27 @@ static int __migrate_folio_unmap(struct folio *src, struct folio *dst,
>>  		return MIGRATEPAGE_UNMAP;
>
> I reaslise this is pre-existing but the mixing of setting rc or
> returning codes directly is a bit hard to follow.

In fact this is the success case, the below is error processing code.
Setting rc and gotoing is for failure case.  I can change all failure
case to set rc and goto.

Best Regards,
Huang, Ying

>>  	}
>>  
>> -
>> -	if (page_was_mapped)
>> -		remove_migration_ptes(src, src, false);
>> -
>> -out_unlock_both:
>> -	folio_unlock(dst);
>> -out_unlock:
>> -	/* Drop an anon_vma reference if we took one */
>> -	if (anon_vma)
>> -		put_anon_vma(anon_vma);
>> -	folio_unlock(src);
>>  out:
>> -
>> -	return rc;
>> -}
>> -
>> -/* Obtain the lock on page, remove all ptes. */
>> -static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
>> -			       unsigned long private, struct folio *src,
>> -			       struct folio **dstp, int force, bool force_lock,
>> -			       enum migrate_mode mode, enum migrate_reason reason,
>> -			       struct list_head *ret)
>> -{
>> -	struct folio *dst;
>> -	int rc = MIGRATEPAGE_UNMAP;
>> -	struct page *newpage = NULL;
>> -
>> -	if (!thp_migration_supported() && folio_test_transhuge(src))
>> -		return -ENOSYS;
>> -
>> -	if (folio_ref_count(src) == 1) {
>> -		/* Folio was freed from under us. So we are done. */
>> -		folio_clear_active(src);
>> -		folio_clear_unevictable(src);
>> -		/* free_pages_prepare() will clear PG_isolated. */
>> -		list_del(&src->lru);
>> -		migrate_folio_done(src, reason);
>> -		return MIGRATEPAGE_SUCCESS;
>> -	}
>> -
>> -	newpage = get_new_page(&src->page, private);
>> -	if (!newpage)
>> -		return -ENOMEM;
>> -	dst = page_folio(newpage);
>> -	*dstp = dst;
>> -
>> -	dst->private = NULL;
>> -	rc = __migrate_folio_unmap(src, dst, force, force_lock, mode);
>> -	if (rc == MIGRATEPAGE_UNMAP)
>> -		return rc;
>> -
>>  	/*
>>  	 * A page that has not been migrated will have kept its
>>  	 * references and be restored.
>>  	 */
>>  	/* restore the folio to right list. */
>> -	if (rc != -EAGAIN && rc != -EDEADLOCK)
>> -		list_move_tail(&src->lru, ret);
>> +	if (rc == -EAGAIN || rc == -EDEADLOCK)
>> +		ret = NULL;
>>  
>> -	if (put_new_page)
>> -		put_new_page(&dst->page, private);
>> -	else
>> -		folio_put(dst);
>> +	migrate_folio_undo_src(src, page_was_mapped, anon_vma, locked, ret);
>> +	if (dst)
>> +		migrate_folio_undo_dst(dst, dst_locked, put_new_page, private);
>>  
>>  	return rc;
>>  }
>>  
>> -static int __migrate_folio_move(struct folio *src, struct folio *dst,
>> -				enum migrate_mode mode)
>> +/* Migrate the folio to the newly allocated folio in dst. */
>> +static int migrate_folio_move(free_page_t put_new_page, unsigned long private,
>> +			      struct folio *src, struct folio *dst,
>> +			      enum migrate_mode mode, enum migrate_reason reason,
>> +			      struct list_head *ret)
>>  {
>>  	int rc;
>>  	int page_was_mapped = 0;
>> @@ -1264,9 +1253,10 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst,
>>  	__migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
>>  
>>  	rc = move_to_new_folio(dst, src, mode);
>> +	if (rc)
>> +		goto out;
>>  
>> -	if (rc != -EAGAIN)
>> -		list_del(&dst->lru);
>> +	list_del(&dst->lru);
>>  	/*
>>  	 * When successful, push dst to LRU immediately: so that if it
>>  	 * turns out to be an mlocked page, remove_migration_ptes() will
>> @@ -1276,74 +1266,40 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst,
>>  	 * unsuccessful, and other cases when a page has been temporarily
>>  	 * isolated from the unevictable LRU: but this case is the easiest.
>>  	 */
>> -	if (rc == MIGRATEPAGE_SUCCESS) {
>> -		folio_add_lru(dst);
>> -		if (page_was_mapped)
>> -			lru_add_drain();
>> -	}
>> -
>> -	if (rc == -EAGAIN) {
>> -		__migrate_folio_record(dst, page_was_mapped, anon_vma);
>> -		return rc;
>> -	}
>> -
>> +	folio_add_lru(dst);
>>  	if (page_was_mapped)
>> -		remove_migration_ptes(src,
>> -			rc == MIGRATEPAGE_SUCCESS ? dst : src, false);
>> +		lru_add_drain();
>>  
>> +	if (page_was_mapped)
>> +		remove_migration_ptes(src, dst, false);
>>  	folio_unlock(dst);
>> -	/* Drop an anon_vma reference if we took one */
>> -	if (anon_vma)
>> -		put_anon_vma(anon_vma);
>> -	folio_unlock(src);
>> +	set_page_owner_migrate_reason(&dst->page, reason);
>>  	/*
>>  	 * If migration is successful, decrease refcount of dst,
>>  	 * which will not free the page because new page owner increased
>>  	 * refcounter.
>>  	 */
>> -	if (rc == MIGRATEPAGE_SUCCESS)
>> -		folio_put(dst);
>> -
>> -	return rc;
>> -}
>> -
>> -/* Migrate the folio to the newly allocated folio in dst. */
>> -static int migrate_folio_move(free_page_t put_new_page, unsigned long private,
>> -			      struct folio *src, struct folio *dst,
>> -			      enum migrate_mode mode, enum migrate_reason reason,
>> -			      struct list_head *ret)
>> -{
>> -	int rc;
>> -
>> -	rc = __migrate_folio_move(src, dst, mode);
>> -	if (rc == MIGRATEPAGE_SUCCESS)
>> -		set_page_owner_migrate_reason(&dst->page, reason);
>> -
>> -	if (rc != -EAGAIN) {
>> -		/*
>> -		 * A folio that has been migrated has all references
>> -		 * removed and will be freed. A folio that has not been
>> -		 * migrated will have kept its references and be restored.
>> -		 */
>> -		list_del(&src->lru);
>> -	}
>> +	folio_put(dst);
>>  
>>  	/*
>> -	 * If migration is successful, releases reference grabbed during
>> -	 * isolation. Otherwise, restore the folio to right list unless
>> -	 * we want to retry.
>> +	 * A page that has been migrated has all references removed
>> +	 * and will be freed.
>>  	 */
>> -	if (rc == MIGRATEPAGE_SUCCESS) {
>> -		migrate_folio_done(src, reason);
>> -	} else if (rc != -EAGAIN) {
>> -		list_add_tail(&src->lru, ret);
>> +	list_del(&src->lru);
>> +	migrate_folio_undo_src(src, 0, anon_vma, true, NULL);
>> +	migrate_folio_done(src, reason);
>>  
>> -		if (put_new_page)
>> -			put_new_page(&dst->page, private);
>> -		else
>> -			folio_put(dst);
>> +	return rc;
>> +out:
>> +	if (rc == -EAGAIN) {
>> +		__migrate_folio_record(dst, page_was_mapped, anon_vma);
>> +		return rc;
>>  	}
>>  
>> +	migrate_folio_undo_src(src, page_was_mapped, anon_vma, true, ret);
>> +	list_del(&dst->lru);
>> +	migrate_folio_undo_dst(dst, true, put_new_page, private);
>> +
>>  	return rc;
>>  }
>>  
>> @@ -1849,9 +1805,9 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>>  
>>  		__migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
>>  		migrate_folio_undo_src(folio, page_was_mapped, anon_vma,
>> -				       ret_folios);
>> +				       true, ret_folios);
>>  		list_del(&dst->lru);
>> -		migrate_folio_undo_dst(dst, put_new_page, private);
>> +		migrate_folio_undo_dst(dst, true, put_new_page, private);
>>  		dst = dst2;
>>  		dst2 = list_next_entry(dst, lru);
>>  	}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/8] migrate_pages: split unmap_and_move() to _unmap() and _move()
  2023-01-05 18:57     ` Kees Cook
@ 2023-01-08 23:33       ` Huang, Ying
  0 siblings, 0 replies; 39+ messages in thread
From: Huang, Ying @ 2023-01-08 23:33 UTC (permalink / raw)
  To: Kees Cook
  Cc: Nathan Chancellor, Andrew Morton, linux-mm, linux-kernel, Zi Yan,
	Yang Shi, Baolin Wang, Oscar Salvador, Matthew Wilcox,
	Bharata B Rao, Alistair Popple, haoxin

Kees Cook <keescook@chromium.org> writes:

> On Thu, Jan 05, 2023 at 11:26:55AM -0700, Nathan Chancellor wrote:
>> Hi Ying,
>> 
>> On Tue, Dec 27, 2022 at 08:28:55AM +0800, Huang Ying wrote:
>> > This is a preparation patch to batch the folio unmapping and moving.
>> > 
>> > In this patch, unmap_and_move() is split to migrate_folio_unmap() and
>> > migrate_folio_move().  So, we can batch _unmap() and _move() in
>> > different loops later.  To pass some information between unmap and
>> > move, the original unused dst->mapping and dst->private are used.
>> > 
>> > Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> > Cc: Zi Yan <ziy@nvidia.com>
>> > Cc: Yang Shi <shy828301@gmail.com>
>> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>> > Cc: Oscar Salvador <osalvador@suse.de>
>> > Cc: Matthew Wilcox <willy@infradead.org>
>> > Cc: Bharata B Rao <bharata@amd.com>
>> > Cc: Alistair Popple <apopple@nvidia.com>
>> > Cc: haoxin <xhao@linux.alibaba.com>
>> > ---
>> >  include/linux/migrate.h |   1 +
>> >  mm/migrate.c            | 162 +++++++++++++++++++++++++++++-----------
>> >  2 files changed, 121 insertions(+), 42 deletions(-)
>> > 
>> > diff --git a/include/linux/migrate.h b/include/linux/migrate.h
>> > index 3ef77f52a4f0..7376074f2e1e 100644
>> > --- a/include/linux/migrate.h
>> > +++ b/include/linux/migrate.h
>> > @@ -18,6 +18,7 @@ struct migration_target_control;
>> >   * - zero on page migration success;
>> >   */
>> >  #define MIGRATEPAGE_SUCCESS		0
>> > +#define MIGRATEPAGE_UNMAP		1
>> >  
>> >  /**
>> >   * struct movable_operations - Driver page migration
>> > diff --git a/mm/migrate.c b/mm/migrate.c
>> > index 97ea0737ab2b..e2383b430932 100644
>> > --- a/mm/migrate.c
>> > +++ b/mm/migrate.c
>> > @@ -1009,11 +1009,29 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
>> >  	return rc;
>> >  }
>> >  
>> > -static int __unmap_and_move(struct folio *src, struct folio *dst,
>> > +static void __migrate_folio_record(struct folio *dst,
>> > +				   unsigned long page_was_mapped,
>> > +				   struct anon_vma *anon_vma)
>> > +{
>> > +	dst->mapping = (struct address_space *)anon_vma;
>> > +	dst->private = (void *)page_was_mapped;
>> > +}
>> > +
>> > +static void __migrate_folio_extract(struct folio *dst,
>> > +				   int *page_was_mappedp,
>> > +				   struct anon_vma **anon_vmap)
>> > +{
>> > +	*anon_vmap = (struct anon_vma *)dst->mapping;
>> > +	*page_was_mappedp = (unsigned long)dst->private;
>> > +	dst->mapping = NULL;
>> > +	dst->private = NULL;
>> > +}
>> 
>> This patch as commit 42871c600cad ("migrate_pages: split
>> unmap_and_move() to _unmap() and _move()") in next-20230105 causes the
>> following error with clang when CONFIG_RANDSTRUCT is enabled, which is
>> the case with allmodconfig:
>> 
>>   ../mm/migrate.c:1041:15: error: casting from randomized structure pointer type 'struct address_space *' to 'struct anon_vma *'
>>           *anon_vmap = (struct anon_vma *)dst->mapping;
>>                        ^
>>   1 error generated.
>> 
>> With GCC, there is only a note:
>> 
>>   ../mm/migrate.c: In function '__migrate_folio_extract':
>>   ../mm/migrate.c:1041:20: note: randstruct: casting between randomized structure pointer types (ssa): 'struct anon_vma' and 'struct address_space'
>> 
>>    1041 |         *anon_vmap = (struct anon_vma *)dst->mapping;
>>         |         ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 
>> Kees has done fixes for warnings and errors like this in the past (I
>> just ran
>> 
>>   $ git log -p --grep='randomized structure pointer type'
>> 
>> to find them) but I did not see any that would seem appropriate here
>> hence just the report :)
>
> If this struct is literally just a scratch space and the original struct
> layout doesn't matter, it may be possible to silence this cast by using
> "(void *)" instead of the explicit struct type pointer.

It works!  Thank you very much!

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/8] migrate_pages: separate hugetlb folios migration
  2023-01-05  7:39           ` Alistair Popple
@ 2023-01-09  7:23             ` Huang, Ying
  2023-01-10  1:37               ` Alistair Popple
  0 siblings, 1 reply; 39+ messages in thread
From: Huang, Ying @ 2023-01-09  7:23 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Andrew Morton, linux-mm, linux-kernel, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	haoxin

Alistair Popple <apopple@nvidia.com> writes:

> "Huang, Ying" <ying.huang@intel.com> writes:
>
>> [snip]
>>
>>>
>>>>>> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>>>  		nr_retry_pages = 0;
>>>>>>  
>>>>>>  		list_for_each_entry_safe(folio, folio2, from, lru) {
>>>>>> +			if (folio_test_hugetlb(folio)) {
>>>>>
>>>>> How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
>>>>> any hugetlb folios off the from list?
>>>>
>>>> Retried hugetlb folios will be kept in from list.
>>>
>>> Couldn't migrate_hugetlbs() remove the failing retried pages from the
>>> list on the final pass? That seems cleaner to me.
>>
>> To do that, we need to go through the folio list again to remove all
>> hugetlb pages.  It could be time-consuming in some cases.  So I think
>> that it's better to keep this.
>
> Why? Couldn't we test pass == 9 and remove it from the list if it fails
> the final retry in migrate_hugetlbs()? In any case if it's on the list
> due to failed retries we have already passed over it 10 times, so the
> extra loop hardly seems like a problem.

Yes.  That's possible.  But "test pass == 9" looks more tricky than the
current code.

Feel free to change the code as you suggested on top this series.  If no
others object, I'm OK with that.  OK?

Best Regards,
Huang, Ying

>>
>>>>>> +				list_move_tail(&folio->lru, &ret_folios);
>>>>>> +				continue;
>>>>>> +			}
>>>>>> +
>>>>>>  			/*
>>>>>>  			 * Large folio statistics is based on the source large
>>>>>>  			 * folio. Capture required information that might get
>>>>>>  			 * lost during migration.
>>>>>>  			 */
>>>>>> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
>>>>>> +			is_large = folio_test_large(folio);
>>>>>>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>>>>>>  			nr_pages = folio_nr_pages(folio);
>>>>>> +
>>>>>>  			cond_resched();
>>>>>>  
>>>>>> -			if (folio_test_hugetlb(folio))
>>>>>> -				rc = unmap_and_move_huge_page(get_new_page,
>>>>>> -						put_new_page, private,
>>>>>> -						&folio->page, pass > 2, mode,
>>>>>> -						reason,
>>>>>> -						&ret_folios);
>>>>>> -			else
>>>>>> -				rc = unmap_and_move(get_new_page, put_new_page,
>>>>>> -						private, folio, pass > 2, mode,
>>>>>> -						reason, &ret_folios);
>>>>>> +			rc = unmap_and_move(get_new_page, put_new_page,
>>>>>> +					    private, folio, pass > 2, mode,
>>>>>> +					    reason, &ret_folios);
>>>>>>  			/*
>>>>>>  			 * The rules are:
>>>>>> -			 *	Success: non hugetlb folio will be freed, hugetlb
>>>>>> -			 *		 folio will be put back
>>>>>> +			 *	Success: folio will be freed
>>>>>>  			 *	-EAGAIN: stay on the from list
>>>>>>  			 *	-ENOMEM: stay on the from list
>>>>>>  			 *	-ENOSYS: stay on the from list
>>>>>> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>>>  						stats.nr_thp_split += is_thp;
>>>>>>  						break;
>>>>>>  					}
>>>>>> -				/* Hugetlb migration is unsupported */
>>>>>>  				} else if (!no_split_folio_counting) {
>>>>>>  					nr_failed++;
>>>>>>  				}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/8] migrate_pages: separate hugetlb folios migration
  2023-01-09  7:23             ` Huang, Ying
@ 2023-01-10  1:37               ` Alistair Popple
  0 siblings, 0 replies; 39+ messages in thread
From: Alistair Popple @ 2023-01-10  1:37 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Zi Yan, Yang Shi,
	Baolin Wang, Oscar Salvador, Matthew Wilcox, Bharata B Rao,
	haoxin


"Huang, Ying" <ying.huang@intel.com> writes:

> Alistair Popple <apopple@nvidia.com> writes:
>
>> "Huang, Ying" <ying.huang@intel.com> writes:
>>
>>> [snip]
>>>
>>>>
>>>>>>> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>>>>  		nr_retry_pages = 0;
>>>>>>>  
>>>>>>>  		list_for_each_entry_safe(folio, folio2, from, lru) {
>>>>>>> +			if (folio_test_hugetlb(folio)) {
>>>>>>
>>>>>> How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
>>>>>> any hugetlb folios off the from list?
>>>>>
>>>>> Retried hugetlb folios will be kept in from list.
>>>>
>>>> Couldn't migrate_hugetlbs() remove the failing retried pages from the
>>>> list on the final pass? That seems cleaner to me.
>>>
>>> To do that, we need to go through the folio list again to remove all
>>> hugetlb pages.  It could be time-consuming in some cases.  So I think
>>> that it's better to keep this.
>>
>> Why? Couldn't we test pass == 9 and remove it from the list if it fails
>> the final retry in migrate_hugetlbs()? In any case if it's on the list
>> due to failed retries we have already passed over it 10 times, so the
>> extra loop hardly seems like a problem.
>
> Yes.  That's possible.  But "test pass == 9" looks more tricky than the
> current code.
>
> Feel free to change the code as you suggested on top this series.  If no
> others object, I'm OK with that.  OK?

Sure. Part of my problem when reviewing this series is that everytime I
look at migrate_pages(), and in particular the number of conditionals
that are sufficiently non-obvious to require extensive comments, I can't
help but think it all needs some refactoring before making it any more
complicated. However perhaps I am alone in that.

Either way this kind of refactoring has been on my TODO list for a while
- I have a WIP series to converge some of the migrate_device.c code
which I will need to rebase on this anyway so as you suggest I could
make a lot of my suggested changes on top of this series.

Regards,

Alistair

> Best Regards,
> Huang, Ying
>
>>>
>>>>>>> +				list_move_tail(&folio->lru, &ret_folios);
>>>>>>> +				continue;
>>>>>>> +			}
>>>>>>> +
>>>>>>>  			/*
>>>>>>>  			 * Large folio statistics is based on the source large
>>>>>>>  			 * folio. Capture required information that might get
>>>>>>>  			 * lost during migration.
>>>>>>>  			 */
>>>>>>> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
>>>>>>> +			is_large = folio_test_large(folio);
>>>>>>>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>>>>>>>  			nr_pages = folio_nr_pages(folio);
>>>>>>> +
>>>>>>>  			cond_resched();
>>>>>>>  
>>>>>>> -			if (folio_test_hugetlb(folio))
>>>>>>> -				rc = unmap_and_move_huge_page(get_new_page,
>>>>>>> -						put_new_page, private,
>>>>>>> -						&folio->page, pass > 2, mode,
>>>>>>> -						reason,
>>>>>>> -						&ret_folios);
>>>>>>> -			else
>>>>>>> -				rc = unmap_and_move(get_new_page, put_new_page,
>>>>>>> -						private, folio, pass > 2, mode,
>>>>>>> -						reason, &ret_folios);
>>>>>>> +			rc = unmap_and_move(get_new_page, put_new_page,
>>>>>>> +					    private, folio, pass > 2, mode,
>>>>>>> +					    reason, &ret_folios);
>>>>>>>  			/*
>>>>>>>  			 * The rules are:
>>>>>>> -			 *	Success: non hugetlb folio will be freed, hugetlb
>>>>>>> -			 *		 folio will be put back
>>>>>>> +			 *	Success: folio will be freed
>>>>>>>  			 *	-EAGAIN: stay on the from list
>>>>>>>  			 *	-ENOMEM: stay on the from list
>>>>>>>  			 *	-ENOSYS: stay on the from list
>>>>>>> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>>>>  						stats.nr_thp_split += is_thp;
>>>>>>>  						break;
>>>>>>>  					}
>>>>>>> -				/* Hugetlb migration is unsupported */
>>>>>>>  				} else if (!no_split_folio_counting) {
>>>>>>>  					nr_failed++;
>>>>>>>  				}


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2023-01-10  1:53 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-27  0:28 [PATCH 0/8] migrate_pages(): batch TLB flushing Huang Ying
2022-12-27  0:28 ` [PATCH 1/8] migrate_pages: organize stats with struct migrate_pages_stats Huang Ying
2023-01-03 18:06   ` Zi Yan
2023-01-05  3:02   ` Alistair Popple
2023-01-05  5:53     ` Huang, Ying
2023-01-05  6:50       ` Alistair Popple
2023-01-05  7:06         ` Huang, Ying
2022-12-27  0:28 ` [PATCH 2/8] migrate_pages: separate hugetlb folios migration Huang Ying
2022-12-28 23:17   ` Andrew Morton
2023-01-02 23:53     ` Huang, Ying
2023-01-05  4:13   ` Alistair Popple
2023-01-05  5:51     ` Huang, Ying
2023-01-05  6:43       ` Alistair Popple
2023-01-05  7:31         ` Huang, Ying
2023-01-05  7:39           ` Alistair Popple
2023-01-09  7:23             ` Huang, Ying
2023-01-10  1:37               ` Alistair Popple
2022-12-27  0:28 ` [PATCH 3/8] migrate_pages: restrict number of pages to migrate in batch Huang Ying
2023-01-03 18:40   ` Zi Yan
2023-01-04  0:24     ` Huang, Ying
2022-12-27  0:28 ` [PATCH 4/8] migrate_pages: split unmap_and_move() to _unmap() and _move() Huang Ying
2023-01-03 18:55   ` Zi Yan
2023-01-05 18:26   ` Nathan Chancellor
2023-01-05 18:57     ` Kees Cook
2023-01-08 23:33       ` Huang, Ying
2022-12-27  0:28 ` [PATCH 5/8] migrate_pages: batch _unmap and _move Huang Ying
2022-12-28 23:22   ` Andrew Morton
2023-01-02 23:29     ` Huang, Ying
2023-01-03 19:01   ` Zi Yan
2023-01-04  0:34     ` Huang, Ying
2022-12-27  0:28 ` [PATCH 6/8] migrate_pages: move migrate_folio_done() and migrate_folio_unmap() Huang Ying
2023-01-03 19:02   ` Zi Yan
2023-01-04  1:26     ` Huang, Ying
2022-12-27  0:28 ` [PATCH 7/8] migrate_pages: share more code between _unmap and _move Huang Ying
2023-01-04  7:12   ` Alistair Popple
2023-01-06  4:15     ` Huang, Ying
2022-12-27  0:28 ` [PATCH 8/8] migrate_pages: batch flushing TLB Huang Ying
2023-01-03 19:19   ` Zi Yan
2023-01-04  1:41     ` Huang, Ying

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.