linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, apopple@nvidia.com, david@redhat.com,
	dhowells@redhat.com, hughd@google.com, jgg@nvidia.com,
	jglisse@redhat.com, jhubbard@nvidia.com, linux-mm@kvack.org,
	mm-commits@vger.kernel.org, rcampbell@nvidia.com,
	torvalds@linux-foundation.org, willy@infradead.org
Subject: [patch 01/69] mm/migrate.c: rework migration_entry_wait() to not take a pageref
Date: Fri, 21 Jan 2022 22:10:46 -0800	[thread overview]
Message-ID: <20220122061046.C2ZhZBoAE%akpm@linux-foundation.org> (raw)
In-Reply-To: <20220121221021.60533b009c357d660791476e@linux-foundation.org>

From: Alistair Popple <apopple@nvidia.com>
Subject: mm/migrate.c: rework migration_entry_wait() to not take a pageref

This fixes the FIXME in migrate_vma_check_page().

Before migrating a page migration code will take a reference and check
there are no unexpected page references, failing the migration if there
are.  When a thread faults on a migration entry it will take a temporary
reference to the page to wait for the page to become unlocked signifying
the migration entry has been removed.

This reference is dropped just prior to waiting on the page lock, however
the extra reference can cause migration failures so it is desirable to
avoid taking it.

As migration code already has a reference to the migrating page an extra
reference to wait on PG_locked is unnecessary so long as the reference
can't be dropped whilst setting up the wait.

When faulting on a migration entry the ptl is taken to check the migration
entry.  Removing a migration entry also requires the ptl, and migration
code won't drop its page reference until after the migration entry has
been removed.  Therefore retaining the ptl of a migration entry is
sufficient to ensure the page has a reference.  Reworking
migration_entry_wait() to hold the ptl until the wait setup is complete
means the extra page reference is no longer needed.

[apopple@nvidia.com: v5]
  Link: https://lkml.kernel.org/r/20211213033848.1973946-1-apopple@nvidia.com
Link: https://lkml.kernel.org/r/20211118020754.954425-1-apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/migrate.h |    2 
 mm/filemap.c            |   91 ++++++++++++++++++++++++++++++++++++++
 mm/migrate.c            |   38 +--------------
 3 files changed, 97 insertions(+), 34 deletions(-)

--- a/include/linux/migrate.h~mm-migratec-rework-migration_entry_wait-to-not-take-a-pageref
+++ a/include/linux/migrate.h
@@ -40,6 +40,8 @@ extern int migrate_huge_page_move_mappin
 				  struct page *newpage, struct page *page);
 extern int migrate_page_move_mapping(struct address_space *mapping,
 		struct page *newpage, struct page *page, int extra_count);
+void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
+				spinlock_t *ptl);
 void folio_migrate_flags(struct folio *newfolio, struct folio *folio);
 void folio_migrate_copy(struct folio *newfolio, struct folio *folio);
 int folio_migrate_mapping(struct address_space *mapping,
--- a/mm/filemap.c~mm-migratec-rework-migration_entry_wait-to-not-take-a-pageref
+++ a/mm/filemap.c
@@ -21,6 +21,7 @@
 #include <linux/gfp.h>
 #include <linux/mm.h>
 #include <linux/swap.h>
+#include <linux/swapops.h>
 #include <linux/mman.h>
 #include <linux/pagemap.h>
 #include <linux/file.h>
@@ -41,6 +42,7 @@
 #include <linux/psi.h>
 #include <linux/ramfs.h>
 #include <linux/page_idle.h>
+#include <linux/migrate.h>
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 #include "internal.h"
@@ -1388,6 +1390,95 @@ repeat:
 	return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
 }
 
+#ifdef CONFIG_MIGRATION
+/**
+ * migration_entry_wait_on_locked - Wait for a migration entry to be removed
+ * @entry: migration swap entry.
+ * @ptep: mapped pte pointer. Will return with the ptep unmapped. Only required
+ *        for pte entries, pass NULL for pmd entries.
+ * @ptl: already locked ptl. This function will drop the lock.
+ *
+ * Wait for a migration entry referencing the given page to be removed. This is
+ * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
+ * this can be called without taking a reference on the page. Instead this
+ * should be called while holding the ptl for the migration entry referencing
+ * the page.
+ *
+ * Returns after unmapping and unlocking the pte/ptl with pte_unmap_unlock().
+ *
+ * This follows the same logic as folio_wait_bit_common() so see the comments
+ * there.
+ */
+void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep,
+				spinlock_t *ptl)
+{
+	struct wait_page_queue wait_page;
+	wait_queue_entry_t *wait = &wait_page.wait;
+	bool thrashing = false;
+	bool delayacct = false;
+	unsigned long pflags;
+	wait_queue_head_t *q;
+	struct folio *folio = page_folio(pfn_swap_entry_to_page(entry));
+
+	q = folio_waitqueue(folio);
+	if (!folio_test_uptodate(folio) && folio_test_workingset(folio)) {
+		if (!folio_test_swapbacked(folio)) {
+			delayacct_thrashing_start();
+			delayacct = true;
+		}
+		psi_memstall_enter(&pflags);
+		thrashing = true;
+	}
+
+	init_wait(wait);
+	wait->func = wake_page_function;
+	wait_page.folio = folio;
+	wait_page.bit_nr = PG_locked;
+	wait->flags = 0;
+
+	spin_lock_irq(&q->lock);
+	folio_set_waiters(folio);
+	if (!folio_trylock_flag(folio, PG_locked, wait))
+		__add_wait_queue_entry_tail(q, wait);
+	spin_unlock_irq(&q->lock);
+
+	/*
+	 * If a migration entry exists for the page the migration path must hold
+	 * a valid reference to the page, and it must take the ptl to remove the
+	 * migration entry. So the page is valid until the ptl is dropped.
+	 */
+	if (ptep)
+		pte_unmap_unlock(ptep, ptl);
+	else
+		spin_unlock(ptl);
+
+	for (;;) {
+		unsigned int flags;
+
+		set_current_state(TASK_UNINTERRUPTIBLE);
+
+		/* Loop until we've been woken or interrupted */
+		flags = smp_load_acquire(&wait->flags);
+		if (!(flags & WQ_FLAG_WOKEN)) {
+			if (signal_pending_state(TASK_UNINTERRUPTIBLE, current))
+				break;
+
+			io_schedule();
+			continue;
+		}
+		break;
+	}
+
+	finish_wait(q, wait);
+
+	if (thrashing) {
+		if (delayacct)
+			delayacct_thrashing_end();
+		psi_memstall_leave(&pflags);
+	}
+}
+#endif
+
 void folio_wait_bit(struct folio *folio, int bit_nr)
 {
 	folio_wait_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED);
--- a/mm/migrate.c~mm-migratec-rework-migration_entry_wait-to-not-take-a-pageref
+++ a/mm/migrate.c
@@ -291,7 +291,6 @@ void __migration_entry_wait(struct mm_st
 {
 	pte_t pte;
 	swp_entry_t entry;
-	struct folio *folio;
 
 	spin_lock(ptl);
 	pte = *ptep;
@@ -302,17 +301,7 @@ void __migration_entry_wait(struct mm_st
 	if (!is_migration_entry(entry))
 		goto out;
 
-	folio = page_folio(pfn_swap_entry_to_page(entry));
-
-	/*
-	 * Once page cache replacement of page migration started, page_count
-	 * is zero; but we must not call folio_put_wait_locked() without
-	 * a ref. Use folio_try_get(), and just fault again if it fails.
-	 */
-	if (!folio_try_get(folio))
-		goto out;
-	pte_unmap_unlock(ptep, ptl);
-	folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE);
+	migration_entry_wait_on_locked(entry, ptep, ptl);
 	return;
 out:
 	pte_unmap_unlock(ptep, ptl);
@@ -337,16 +326,11 @@ void migration_entry_wait_huge(struct vm
 void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
 {
 	spinlock_t *ptl;
-	struct folio *folio;
 
 	ptl = pmd_lock(mm, pmd);
 	if (!is_pmd_migration_entry(*pmd))
 		goto unlock;
-	folio = page_folio(pfn_swap_entry_to_page(pmd_to_swp_entry(*pmd)));
-	if (!folio_try_get(folio))
-		goto unlock;
-	spin_unlock(ptl);
-	folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE);
+	migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl);
 	return;
 unlock:
 	spin_unlock(ptl);
@@ -2431,22 +2415,8 @@ static bool migrate_vma_check_page(struc
 		return false;
 
 	/* Page from ZONE_DEVICE have one extra reference */
-	if (is_zone_device_page(page)) {
-		/*
-		 * Private page can never be pin as they have no valid pte and
-		 * GUP will fail for those. Yet if there is a pending migration
-		 * a thread might try to wait on the pte migration entry and
-		 * will bump the page reference count. Sadly there is no way to
-		 * differentiate a regular pin from migration wait. Hence to
-		 * avoid 2 racing thread trying to migrate back to CPU to enter
-		 * infinite loop (one stopping migration because the other is
-		 * waiting on pte migration entry). We always return true here.
-		 *
-		 * FIXME proper solution is to rework migration_entry_wait() so
-		 * it does not need to take a reference on page.
-		 */
-		return is_device_private_page(page);
-	}
+	if (is_zone_device_page(page))
+		extra++;
 
 	/* For file back page */
 	if (page_mapping(page))
_


  reply	other threads:[~2022-01-22  6:10 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-22  6:10 incoming Andrew Morton
2022-01-22  6:10 ` Andrew Morton [this message]
2022-01-22  6:10 ` [patch 02/69] sysctl: add a new register_sysctl_init() interface Andrew Morton
2022-01-22  6:10 ` [patch 03/69] sysctl: move some boundary constants from sysctl.c to sysctl_vals Andrew Morton
2022-01-22  6:11 ` [patch 04/69] hung_task: move hung_task sysctl interface to hung_task.c Andrew Morton
2022-01-22  6:11 ` [patch 05/69] watchdog: move watchdog sysctl interface to watchdog.c Andrew Morton
2022-01-22  6:11 ` [patch 06/69] sysctl: make ngroups_max const Andrew Morton
2022-01-22  6:11 ` [patch 07/69] sysctl: use const for typically used max/min proc sysctls Andrew Morton
2022-01-22  6:11 ` [patch 08/69] sysctl: use SYSCTL_ZERO to replace some static int zero uses Andrew Morton
2022-01-22  6:11 ` [patch 09/69] aio: move aio sysctl to aio.c Andrew Morton
2022-01-22  6:11 ` [patch 10/69] dnotify: move dnotify sysctl to dnotify.c Andrew Morton
2022-01-22  6:11 ` [patch 11/69] hpet: simplify subdirectory registration with register_sysctl() Andrew Morton
2022-01-22  6:11 ` [patch 12/69] i915: " Andrew Morton
2022-01-22  6:11 ` [patch 13/69] macintosh/mac_hid.c: " Andrew Morton
2022-01-22  6:11 ` [patch 14/69] ocfs2: " Andrew Morton
2022-01-22  6:11 ` [patch 15/69] test_sysctl: " Andrew Morton
2022-01-22  6:11 ` [patch 16/69] inotify: " Andrew Morton
2022-01-22  6:12 ` [patch 17/69] cdrom: " Andrew Morton
2022-01-22  6:12 ` [patch 18/69] eventpoll: simplify sysctl declaration " Andrew Morton
2022-01-22  6:12 ` [patch 19/69] firmware_loader: move firmware sysctl to its own files Andrew Morton
2022-01-22  6:12 ` [patch 20/69] random: move the random sysctl declarations to its own file Andrew Morton
2022-01-22  6:12 ` [patch 21/69] sysctl: add helper to register a sysctl mount point Andrew Morton
2022-01-22  6:12 ` [patch 22/69] fs: move binfmt_misc sysctl to its own file Andrew Morton
2022-02-07 13:27   ` [PATCH] Fix regression due to "fs: move binfmt_misc sysctl to its own file" Domenico Andreoli
2022-02-07 21:46     ` Luis Chamberlain
2022-02-07 22:53       ` Tong Zhang
2022-02-08 17:20         ` Luis Chamberlain
2022-02-09  7:31           ` Domenico Andreoli
2022-02-09  7:49           ` [PATCH v2] " Domenico Andreoli
2022-02-09  7:55             ` Tong Zhang
2022-02-13 15:34             ` Ido Schimmel
2022-02-13 21:09               ` Tong Zhang
2022-02-13 21:10               ` Tong Zhang
2022-02-14  7:47                 ` Ido Schimmel
2022-02-08  6:46     ` [PATCH] " Thorsten Leemhuis
2022-01-22  6:12 ` [patch 23/69] printk: move printk sysctl to printk/sysctl.c Andrew Morton
2022-01-22  6:12 ` [patch 24/69] scsi/sg: move sg-big-buff sysctl to scsi/sg.c Andrew Morton
2022-01-22  6:12 ` [patch 25/69] stackleak: move stack_erasing sysctl to stackleak.c Andrew Morton
2022-01-22  6:12 ` [patch 26/69] sysctl: share unsigned long const values Andrew Morton
2022-01-22  6:12 ` [patch 27/69] fs: move inode sysctls to its own file Andrew Morton
2022-01-22  6:12 ` [patch 28/69] fs: move fs stat sysctls to file_table.c Andrew Morton
2022-01-22  6:12 ` [patch 29/69] fs: move dcache sysctls to its own file Andrew Morton
2022-01-22  6:13 ` [patch 30/69] sysctl: move maxolduid as a sysctl specific const Andrew Morton
2022-01-22  6:13 ` [patch 31/69] fs: move shared sysctls to fs/sysctls.c Andrew Morton
2022-01-22  6:13 ` [patch 32/69] fs: move locking sysctls where they are used Andrew Morton
2022-01-22  6:13 ` [patch 33/69] fs: move namei sysctls to its own file Andrew Morton
2022-01-22  6:13 ` [patch 34/69] fs: move fs/exec.c sysctls into " Andrew Morton
2022-01-22  6:13 ` [patch 35/69] fs: move pipe sysctls to is " Andrew Morton
2022-01-22  6:13 ` [patch 36/69] sysctl: add and use base directory declarer and registration helper Andrew Morton
2022-01-22  6:13 ` [patch 37/69] fs: move namespace sysctls and declare fs base directory Andrew Morton
2022-01-22  6:13 ` [patch 38/69] kernel/sysctl.c: rename sysctl_init() to sysctl_init_bases() Andrew Morton
2022-01-22  6:13 ` [patch 39/69] printk: fix build warning when CONFIG_PRINTK=n Andrew Morton
2022-01-22  6:13 ` [patch 40/69] fs/coredump: move coredump sysctls into its own file Andrew Morton
2022-01-22  6:13 ` [patch 41/69] kprobe: move sysctl_kprobes_optimization to kprobes.c Andrew Morton
2022-01-22  6:13 ` [patch 42/69] kernel/sysctl.c: remove unused variable ten_thousand Andrew Morton
2022-01-22  6:13 ` [patch 43/69] sysctl: returns -EINVAL when a negative value is passed to proc_doulongvec_minmax Andrew Morton
2022-01-22  6:13 ` [patch 44/69] zsmalloc: introduce some helper functions Andrew Morton
2022-01-22  6:13 ` [patch 45/69] zsmalloc: rename zs_stat_type to class_stat_type Andrew Morton
2022-01-22  6:13 ` [patch 46/69] zsmalloc: decouple class actions from zspage works Andrew Morton
2022-01-22  6:14 ` [patch 47/69] zsmalloc: introduce obj_allocated Andrew Morton
2022-01-22  6:14 ` [patch 48/69] zsmalloc: move huge compressed obj from page to zspage Andrew Morton
2022-01-22  6:14 ` [patch 49/69] zsmalloc: remove zspage isolation for migration Andrew Morton
2022-01-22  6:14 ` [patch 50/69] locking/rwlocks: introduce write_lock_nested Andrew Morton
2022-01-22  6:14 ` [patch 51/69] zsmalloc: replace per zpage lock with pool->migrate_lock Andrew Morton
2022-01-22  6:14 ` [patch 52/69] zsmalloc: replace get_cpu_var with local_lock Andrew Morton
2022-01-22  6:14 ` [patch 53/69] fs: proc: store PDE()->data into inode->i_private Andrew Morton
2022-01-22  6:14 ` [patch 54/69] proc: remove PDE_DATA() completely Andrew Morton
2022-01-22  6:14 ` [patch 55/69] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() Andrew Morton
2022-01-22  6:14 ` [patch 56/69] lib/stackdepot: always do filter_irq_stacks() in stack_depot_save() Andrew Morton
2022-01-22  6:14 ` [patch 57/69] mm: remove cleancache Andrew Morton
2022-01-22  6:14 ` [patch 58/69] frontswap: remove frontswap_writethrough Andrew Morton
2022-01-22  6:14 ` [patch 59/69] frontswap: remove frontswap_tmem_exclusive_gets Andrew Morton
2022-01-22  6:14 ` [patch 60/69] frontswap: remove frontswap_shrink Andrew Morton
2022-01-22  6:14 ` [patch 61/69] frontswap: remove frontswap_curr_pages Andrew Morton
2022-01-22  6:14 ` [patch 62/69] frontswap: simplify frontswap_init Andrew Morton
2022-01-22  6:14 ` [patch 63/69] frontswap: remove the frontswap exports Andrew Morton
2022-01-22  6:14 ` [patch 64/69] mm: simplify try_to_unuse Andrew Morton
2022-01-22  6:15 ` [patch 65/69] frontswap: remove frontswap_test Andrew Morton
2022-01-22  6:15 ` [patch 66/69] frontswap: simplify frontswap_register_ops Andrew Morton
2022-01-22  6:15 ` [patch 67/69] mm: mark swap_lock and swap_active_head static Andrew Morton
2022-01-22  6:15 ` [patch 68/69] frontswap: remove support for multiple ops Andrew Morton
2022-01-22  6:15 ` [patch 69/69] mm: hide the FRONTSWAP Kconfig symbol Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220122061046.C2ZhZBoAE%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=rcampbell@nvidia.com \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).