All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/6] Per-VMA lock support for swap and userfaults
@ 2023-06-30  2:04 Suren Baghdasaryan
  2023-06-30  2:04 ` [PATCH v6 1/6] swap: remove remnants of polling from read_swap_cache_async Suren Baghdasaryan
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Suren Baghdasaryan @ 2023-06-30  2:04 UTC (permalink / raw)
  To: akpm
  Cc: willy, hannes, mhocko, josef, jack, ldufour, laurent.dufour,
	michel, liam.howlett, jglisse, vbabka, minchan, dave,
	punit.agrawal, lstoakes, hdanton, apopple, peterx, ying.huang,
	david, yuzhao, dhowells, hughd, viro, brauner, pasha.tatashin,
	surenb, linux-mm, linux-fsdevel, linux-kernel, kernel-team

When per-VMA locks were introduced in [1] several types of page faults
would still fall back to mmap_lock to keep the patchset simple. Among them
are swap and userfault pages. The main reason for skipping those cases was
the fact that mmap_lock could be dropped while handling these faults and
that required additional logic to be implemented.
Implement the mechanism to allow per-VMA locks to be dropped for these
cases.
First, change handle_mm_fault to drop per-VMA locks when returning
VM_FAULT_RETRY or VM_FAULT_COMPLETED to be consistent with the way
mmap_lock is handled. Then change folio_lock_or_retry to accept vm_fault
and return vm_fault_t which simplifies later patches. Finally allow swap
and uffd page faults to be handled under per-VMA locks by dropping per-VMA
and retrying, the same way it's done under mmap_lock.
Naturally, once VMA lock is dropped that VMA should be assumed unstable
and can't be used.

Changes since v5 posted at [2]
- 6/6 moved changes in sanitize_fault_flags into 3/6, per Peter Xu
- rebased over Linus' ToT

Note: patch 3/6 will cause a trivial merge conflict in arch/arm64/mm/fault.c
when applied over mm-unstable branch due to a patch from ARM64 tree [3]
which is missing in mm-unstable.

[1] https://lore.kernel.org/all/20230227173632.3292573-1-surenb@google.com/
[2] https://lore.kernel.org/all/20230628172529.744839-1-surenb@google.com/
[3] https://lore.kernel.org/all/20230524131305.2808-1-jszhang@kernel.org/

Suren Baghdasaryan (6):
  swap: remove remnants of polling from read_swap_cache_async
  mm: add missing VM_FAULT_RESULT_TRACE name for VM_FAULT_COMPLETED
  mm: drop per-VMA lock when returning VM_FAULT_RETRY or
    VM_FAULT_COMPLETED
  mm: change folio_lock_or_retry to use vm_fault directly
  mm: handle swap page faults under per-VMA lock
  mm: handle userfaults under VMA lock

 arch/arm64/mm/fault.c    |  3 ++-
 arch/powerpc/mm/fault.c  |  3 ++-
 arch/s390/mm/fault.c     |  3 ++-
 arch/x86/mm/fault.c      |  3 ++-
 fs/userfaultfd.c         | 34 ++++++++++++----------------
 include/linux/mm.h       | 37 ++++++++++++++++++++++++++++++
 include/linux/mm_types.h |  3 ++-
 include/linux/pagemap.h  |  9 ++++----
 mm/filemap.c             | 37 +++++++++++++++---------------
 mm/madvise.c             |  4 ++--
 mm/memory.c              | 49 ++++++++++++++++++++++------------------
 mm/swap.h                |  1 -
 mm/swap_state.c          | 12 ++++------
 13 files changed, 118 insertions(+), 80 deletions(-)

-- 
2.41.0.255.g8b1d071c50-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v6 1/6] swap: remove remnants of polling from read_swap_cache_async
  2023-06-30  2:04 [PATCH v6 0/6] Per-VMA lock support for swap and userfaults Suren Baghdasaryan
@ 2023-06-30  2:04 ` Suren Baghdasaryan
  2023-06-30  2:04 ` [PATCH v6 2/6] mm: add missing VM_FAULT_RESULT_TRACE name for VM_FAULT_COMPLETED Suren Baghdasaryan
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Suren Baghdasaryan @ 2023-06-30  2:04 UTC (permalink / raw)
  To: akpm
  Cc: willy, hannes, mhocko, josef, jack, ldufour, laurent.dufour,
	michel, liam.howlett, jglisse, vbabka, minchan, dave,
	punit.agrawal, lstoakes, hdanton, apopple, peterx, ying.huang,
	david, yuzhao, dhowells, hughd, viro, brauner, pasha.tatashin,
	surenb, linux-mm, linux-fsdevel, linux-kernel, kernel-team,
	Christoph Hellwig

Commit [1] introduced IO polling support duding swapin to reduce
swap read latency for block devices that can be polled. However later
commit [2] removed polling support. Therefore it seems safe to remove
do_poll parameter in read_swap_cache_async and always call swap_readpage
with synchronous=false waiting for IO completion in folio_lock_or_retry.

[1] commit 23955622ff8d ("swap: add block io poll in swapin path")
[2] commit 9650b453a3d4 ("block: ignore RWF_HIPRI hint for sync dio")

Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 mm/madvise.c    |  4 ++--
 mm/swap.h       |  1 -
 mm/swap_state.c | 12 +++++-------
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 886f06066622..ac6d92f74f6d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -218,7 +218,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
 		ptep = NULL;
 
 		page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
-					     vma, addr, false, &splug);
+					     vma, addr, &splug);
 		if (page)
 			put_page(page);
 	}
@@ -262,7 +262,7 @@ static void shmem_swapin_range(struct vm_area_struct *vma,
 		rcu_read_unlock();
 
 		page = read_swap_cache_async(entry, mapping_gfp_mask(mapping),
-					     vma, addr, false, &splug);
+					     vma, addr, &splug);
 		if (page)
 			put_page(page);
 
diff --git a/mm/swap.h b/mm/swap.h
index 7c033d793f15..8a3c7a0ace4f 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -46,7 +46,6 @@ struct folio *filemap_get_incore_folio(struct address_space *mapping,
 struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 				   struct vm_area_struct *vma,
 				   unsigned long addr,
-				   bool do_poll,
 				   struct swap_iocb **plug);
 struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 				     struct vm_area_struct *vma,
diff --git a/mm/swap_state.c b/mm/swap_state.c
index f8ea7015bad4..5a690c79cc13 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -527,15 +527,14 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
  */
 struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 				   struct vm_area_struct *vma,
-				   unsigned long addr, bool do_poll,
-				   struct swap_iocb **plug)
+				   unsigned long addr, struct swap_iocb **plug)
 {
 	bool page_was_allocated;
 	struct page *retpage = __read_swap_cache_async(entry, gfp_mask,
 			vma, addr, &page_was_allocated);
 
 	if (page_was_allocated)
-		swap_readpage(retpage, do_poll, plug);
+		swap_readpage(retpage, false, plug);
 
 	return retpage;
 }
@@ -630,7 +629,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 	struct swap_info_struct *si = swp_swap_info(entry);
 	struct blk_plug plug;
 	struct swap_iocb *splug = NULL;
-	bool do_poll = true, page_allocated;
+	bool page_allocated;
 	struct vm_area_struct *vma = vmf->vma;
 	unsigned long addr = vmf->address;
 
@@ -638,7 +637,6 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 	if (!mask)
 		goto skip;
 
-	do_poll = false;
 	/* Read a page_cluster sized and aligned cluster around offset. */
 	start_offset = offset & ~mask;
 	end_offset = offset | mask;
@@ -670,7 +668,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 	lru_add_drain();	/* Push any new pages onto the LRU now */
 skip:
 	/* The page was likely read above, so no need for plugging here */
-	return read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll, NULL);
+	return read_swap_cache_async(entry, gfp_mask, vma, addr, NULL);
 }
 
 int init_swap_address_space(unsigned int type, unsigned long nr_pages)
@@ -838,7 +836,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
 skip:
 	/* The page was likely read above, so no need for plugging here */
 	return read_swap_cache_async(fentry, gfp_mask, vma, vmf->address,
-				     ra_info.win == 1, NULL);
+				     NULL);
 }
 
 /**
-- 
2.41.0.255.g8b1d071c50-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 2/6] mm: add missing VM_FAULT_RESULT_TRACE name for VM_FAULT_COMPLETED
  2023-06-30  2:04 [PATCH v6 0/6] Per-VMA lock support for swap and userfaults Suren Baghdasaryan
  2023-06-30  2:04 ` [PATCH v6 1/6] swap: remove remnants of polling from read_swap_cache_async Suren Baghdasaryan
@ 2023-06-30  2:04 ` Suren Baghdasaryan
  2023-06-30  2:04 ` [PATCH v6 3/6] mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED Suren Baghdasaryan
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Suren Baghdasaryan @ 2023-06-30  2:04 UTC (permalink / raw)
  To: akpm
  Cc: willy, hannes, mhocko, josef, jack, ldufour, laurent.dufour,
	michel, liam.howlett, jglisse, vbabka, minchan, dave,
	punit.agrawal, lstoakes, hdanton, apopple, peterx, ying.huang,
	david, yuzhao, dhowells, hughd, viro, brauner, pasha.tatashin,
	surenb, linux-mm, linux-fsdevel, linux-kernel, kernel-team

VM_FAULT_RESULT_TRACE should contain an element for every vm_fault_reason
to be used as flag_array inside trace_print_flags_seq(). The element
for VM_FAULT_COMPLETED is missing, add it.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 include/linux/mm_types.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index de10fc797c8e..39cd34b4dbaa 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1077,7 +1077,8 @@ enum vm_fault_reason {
 	{ VM_FAULT_RETRY,               "RETRY" },	\
 	{ VM_FAULT_FALLBACK,            "FALLBACK" },	\
 	{ VM_FAULT_DONE_COW,            "DONE_COW" },	\
-	{ VM_FAULT_NEEDDSYNC,           "NEEDDSYNC" }
+	{ VM_FAULT_NEEDDSYNC,           "NEEDDSYNC" },	\
+	{ VM_FAULT_COMPLETED,           "COMPLETED" }
 
 struct vm_special_mapping {
 	const char *name;	/* The name, e.g. "[vdso]". */
-- 
2.41.0.255.g8b1d071c50-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 3/6] mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED
  2023-06-30  2:04 [PATCH v6 0/6] Per-VMA lock support for swap and userfaults Suren Baghdasaryan
  2023-06-30  2:04 ` [PATCH v6 1/6] swap: remove remnants of polling from read_swap_cache_async Suren Baghdasaryan
  2023-06-30  2:04 ` [PATCH v6 2/6] mm: add missing VM_FAULT_RESULT_TRACE name for VM_FAULT_COMPLETED Suren Baghdasaryan
@ 2023-06-30  2:04 ` Suren Baghdasaryan
  2023-06-30  2:04 ` [PATCH v6 4/6] mm: change folio_lock_or_retry to use vm_fault directly Suren Baghdasaryan
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Suren Baghdasaryan @ 2023-06-30  2:04 UTC (permalink / raw)
  To: akpm
  Cc: willy, hannes, mhocko, josef, jack, ldufour, laurent.dufour,
	michel, liam.howlett, jglisse, vbabka, minchan, dave,
	punit.agrawal, lstoakes, hdanton, apopple, peterx, ying.huang,
	david, yuzhao, dhowells, hughd, viro, brauner, pasha.tatashin,
	surenb, linux-mm, linux-fsdevel, linux-kernel, kernel-team

handle_mm_fault returning VM_FAULT_RETRY or VM_FAULT_COMPLETED means
mmap_lock has been released. However with per-VMA locks behavior is
different and the caller should still release it. To make the
rules consistent for the caller, drop the per-VMA lock when returning
VM_FAULT_RETRY or VM_FAULT_COMPLETED. Currently the only path returning
VM_FAULT_RETRY under per-VMA locks is do_swap_page and no path returns
VM_FAULT_COMPLETED for now.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
---
 arch/arm64/mm/fault.c   |  3 ++-
 arch/powerpc/mm/fault.c |  3 ++-
 arch/s390/mm/fault.c    |  3 ++-
 arch/x86/mm/fault.c     |  3 ++-
 mm/memory.c             | 12 ++++++++++++
 5 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 935f0a8911f9..9d78ff78b0e3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -602,7 +602,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 		goto lock_mmap;
 	}
 	fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, regs);
-	vma_end_read(vma);
+	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
+		vma_end_read(vma);
 
 	if (!(fault & VM_FAULT_RETRY)) {
 		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 5bfdf6ecfa96..82954d0e6906 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -489,7 +489,8 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
 	}
 
 	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
-	vma_end_read(vma);
+	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
+		vma_end_read(vma);
 
 	if (!(fault & VM_FAULT_RETRY)) {
 		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index dbe8394234e2..40a71063949b 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -418,7 +418,8 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 		goto lock_mmap;
 	}
 	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
-	vma_end_read(vma);
+	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
+		vma_end_read(vma);
 	if (!(fault & VM_FAULT_RETRY)) {
 		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
 		goto out;
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e8711b2cafaf..56b4f9faf8c4 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1341,7 +1341,8 @@ void do_user_addr_fault(struct pt_regs *regs,
 		goto lock_mmap;
 	}
 	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
-	vma_end_read(vma);
+	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
+		vma_end_read(vma);
 
 	if (!(fault & VM_FAULT_RETRY)) {
 		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
diff --git a/mm/memory.c b/mm/memory.c
index 0ae594703021..5f26c56ce979 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3730,6 +3730,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 
 	if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
 		ret = VM_FAULT_RETRY;
+		vma_end_read(vma);
 		goto out;
 	}
 
@@ -5182,6 +5183,17 @@ static vm_fault_t sanitize_fault_flags(struct vm_area_struct *vma,
 				 !is_cow_mapping(vma->vm_flags)))
 			return VM_FAULT_SIGSEGV;
 	}
+#ifdef CONFIG_PER_VMA_LOCK
+	/*
+	 * Per-VMA locks can't be used with FAULT_FLAG_RETRY_NOWAIT because of
+	 * the assumption that lock is dropped on VM_FAULT_RETRY.
+	 */
+	if (WARN_ON_ONCE((*flags &
+			(FAULT_FLAG_VMA_LOCK | FAULT_FLAG_RETRY_NOWAIT)) ==
+			(FAULT_FLAG_VMA_LOCK | FAULT_FLAG_RETRY_NOWAIT)))
+		return VM_FAULT_SIGSEGV;
+#endif
+
 	return 0;
 }
 
-- 
2.41.0.255.g8b1d071c50-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 4/6] mm: change folio_lock_or_retry to use vm_fault directly
  2023-06-30  2:04 [PATCH v6 0/6] Per-VMA lock support for swap and userfaults Suren Baghdasaryan
                   ` (2 preceding siblings ...)
  2023-06-30  2:04 ` [PATCH v6 3/6] mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED Suren Baghdasaryan
@ 2023-06-30  2:04 ` Suren Baghdasaryan
  2023-06-30  3:36   ` Matthew Wilcox
  2023-06-30  2:04 ` [PATCH v6 5/6] mm: handle swap page faults under per-VMA lock Suren Baghdasaryan
  2023-06-30  2:04 ` [PATCH v6 6/6] mm: handle userfaults under VMA lock Suren Baghdasaryan
  5 siblings, 1 reply; 11+ messages in thread
From: Suren Baghdasaryan @ 2023-06-30  2:04 UTC (permalink / raw)
  To: akpm
  Cc: willy, hannes, mhocko, josef, jack, ldufour, laurent.dufour,
	michel, liam.howlett, jglisse, vbabka, minchan, dave,
	punit.agrawal, lstoakes, hdanton, apopple, peterx, ying.huang,
	david, yuzhao, dhowells, hughd, viro, brauner, pasha.tatashin,
	surenb, linux-mm, linux-fsdevel, linux-kernel, kernel-team

Change folio_lock_or_retry to accept vm_fault struct and return the
vm_fault_t directly.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
---
 include/linux/pagemap.h |  9 ++++-----
 mm/filemap.c            | 22 ++++++++++++----------
 mm/memory.c             | 14 ++++++--------
 3 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 716953ee1ebd..0026a0a8277c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -900,8 +900,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
 
 void __folio_lock(struct folio *folio);
 int __folio_lock_killable(struct folio *folio);
-bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
-				unsigned int flags);
+vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf);
 void unlock_page(struct page *page);
 void folio_unlock(struct folio *folio);
 
@@ -1005,11 +1004,11 @@ static inline int folio_lock_killable(struct folio *folio)
  * Return value and mmap_lock implications depend on flags; see
  * __folio_lock_or_retry().
  */
-static inline bool folio_lock_or_retry(struct folio *folio,
-		struct mm_struct *mm, unsigned int flags)
+static inline vm_fault_t folio_lock_or_retry(struct folio *folio,
+					     struct vm_fault *vmf)
 {
 	might_sleep();
-	return folio_trylock(folio) || __folio_lock_or_retry(folio, mm, flags);
+	return folio_trylock(folio) ? 0 : __folio_lock_or_retry(folio, vmf);
 }
 
 /*
diff --git a/mm/filemap.c b/mm/filemap.c
index 9e44a49bbd74..d245bb4f7153 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1669,32 +1669,34 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait)
 
 /*
  * Return values:
- * true - folio is locked; mmap_lock is still held.
- * false - folio is not locked.
+ * 0 - folio is locked.
+ * VM_FAULT_RETRY - folio is not locked.
  *     mmap_lock has been released (mmap_read_unlock(), unless flags had both
  *     FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in
  *     which case mmap_lock is still held.
  *
- * If neither ALLOW_RETRY nor KILLABLE are set, will always return true
+ * If neither ALLOW_RETRY nor KILLABLE are set, will always return 0
  * with the folio locked and the mmap_lock unperturbed.
  */
-bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
-			 unsigned int flags)
+vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf)
 {
+	struct mm_struct *mm = vmf->vma->vm_mm;
+	unsigned int flags = vmf->flags;
+
 	if (fault_flag_allow_retry_first(flags)) {
 		/*
 		 * CAUTION! In this case, mmap_lock is not released
-		 * even though return 0.
+		 * even though return VM_FAULT_RETRY.
 		 */
 		if (flags & FAULT_FLAG_RETRY_NOWAIT)
-			return false;
+			return VM_FAULT_RETRY;
 
 		mmap_read_unlock(mm);
 		if (flags & FAULT_FLAG_KILLABLE)
 			folio_wait_locked_killable(folio);
 		else
 			folio_wait_locked(folio);
-		return false;
+		return VM_FAULT_RETRY;
 	}
 	if (flags & FAULT_FLAG_KILLABLE) {
 		bool ret;
@@ -1702,13 +1704,13 @@ bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
 		ret = __folio_lock_killable(folio);
 		if (ret) {
 			mmap_read_unlock(mm);
-			return false;
+			return VM_FAULT_RETRY;
 		}
 	} else {
 		__folio_lock(folio);
 	}
 
-	return true;
+	return 0;
 }
 
 /**
diff --git a/mm/memory.c b/mm/memory.c
index 5f26c56ce979..4ae3f046f593 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3582,6 +3582,7 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
 	struct folio *folio = page_folio(vmf->page);
 	struct vm_area_struct *vma = vmf->vma;
 	struct mmu_notifier_range range;
+	vm_fault_t ret;
 
 	/*
 	 * We need a reference to lock the folio because we don't hold
@@ -3594,9 +3595,10 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
 	if (!folio_try_get(folio))
 		return 0;
 
-	if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) {
+	ret = folio_lock_or_retry(folio, vmf);
+	if (ret) {
 		folio_put(folio);
-		return VM_FAULT_RETRY;
+		return ret;
 	}
 	mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0,
 				vma->vm_mm, vmf->address & PAGE_MASK,
@@ -3721,7 +3723,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	bool exclusive = false;
 	swp_entry_t entry;
 	pte_t pte;
-	int locked;
 	vm_fault_t ret = 0;
 	void *shadow = NULL;
 
@@ -3844,12 +3845,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 		goto out_release;
 	}
 
-	locked = folio_lock_or_retry(folio, vma->vm_mm, vmf->flags);
-
-	if (!locked) {
-		ret |= VM_FAULT_RETRY;
+	ret |= folio_lock_or_retry(folio, vmf);
+	if (ret & VM_FAULT_RETRY)
 		goto out_release;
-	}
 
 	if (swapcache) {
 		/*
-- 
2.41.0.255.g8b1d071c50-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 5/6] mm: handle swap page faults under per-VMA lock
  2023-06-30  2:04 [PATCH v6 0/6] Per-VMA lock support for swap and userfaults Suren Baghdasaryan
                   ` (3 preceding siblings ...)
  2023-06-30  2:04 ` [PATCH v6 4/6] mm: change folio_lock_or_retry to use vm_fault directly Suren Baghdasaryan
@ 2023-06-30  2:04 ` Suren Baghdasaryan
  2023-06-30  2:04 ` [PATCH v6 6/6] mm: handle userfaults under VMA lock Suren Baghdasaryan
  5 siblings, 0 replies; 11+ messages in thread
From: Suren Baghdasaryan @ 2023-06-30  2:04 UTC (permalink / raw)
  To: akpm
  Cc: willy, hannes, mhocko, josef, jack, ldufour, laurent.dufour,
	michel, liam.howlett, jglisse, vbabka, minchan, dave,
	punit.agrawal, lstoakes, hdanton, apopple, peterx, ying.huang,
	david, yuzhao, dhowells, hughd, viro, brauner, pasha.tatashin,
	surenb, linux-mm, linux-fsdevel, linux-kernel, kernel-team

When page fault is handled under per-VMA lock protection, all swap page
faults are retried with mmap_lock because folio_lock_or_retry has to drop
and reacquire mmap_lock if folio could not be immediately locked.
Follow the same pattern as mmap_lock to drop per-VMA lock when waiting
for folio and retrying once folio is available.
With this obstacle removed, enable do_swap_page to operate under
per-VMA lock protection. Drivers implementing ops->migrate_to_ram might
still rely on mmap_lock, therefore we have to fall back to mmap_lock in
that particular case.
Note that the only time do_swap_page calls synchronous swap_readpage
is when SWP_SYNCHRONOUS_IO is set, which is only set for
QUEUE_FLAG_SYNCHRONOUS devices: brd, zram and nvdimms (both btt and
pmem). Therefore we don't sleep in this path, and there's no need to
drop the mmap or per-VMA lock.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Peter Xu <peterx@redhat.com>
---
 include/linux/mm.h | 13 +++++++++++++
 mm/filemap.c       | 17 ++++++++---------
 mm/memory.c        | 16 ++++++++++------
 3 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 39aa409e84d5..54ab11214f4f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -720,6 +720,14 @@ static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached)
 	vma->detached = detached;
 }
 
+static inline void release_fault_lock(struct vm_fault *vmf)
+{
+	if (vmf->flags & FAULT_FLAG_VMA_LOCK)
+		vma_end_read(vmf->vma);
+	else
+		mmap_read_unlock(vmf->vma->vm_mm);
+}
+
 struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
 					  unsigned long address);
 
@@ -735,6 +743,11 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma) {}
 static inline void vma_mark_detached(struct vm_area_struct *vma,
 				     bool detached) {}
 
+static inline void release_fault_lock(struct vm_fault *vmf)
+{
+	mmap_read_unlock(vmf->vma->vm_mm);
+}
+
 #endif /* CONFIG_PER_VMA_LOCK */
 
 /*
diff --git a/mm/filemap.c b/mm/filemap.c
index d245bb4f7153..6f4a3d83a073 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1671,27 +1671,26 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait)
  * Return values:
  * 0 - folio is locked.
  * VM_FAULT_RETRY - folio is not locked.
- *     mmap_lock has been released (mmap_read_unlock(), unless flags had both
- *     FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in
- *     which case mmap_lock is still held.
+ *     mmap_lock or per-VMA lock has been released (mmap_read_unlock() or
+ *     vma_end_read()), unless flags had both FAULT_FLAG_ALLOW_RETRY and
+ *     FAULT_FLAG_RETRY_NOWAIT set, in which case the lock is still held.
  *
  * If neither ALLOW_RETRY nor KILLABLE are set, will always return 0
- * with the folio locked and the mmap_lock unperturbed.
+ * with the folio locked and the mmap_lock/per-VMA lock is left unperturbed.
  */
 vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf)
 {
-	struct mm_struct *mm = vmf->vma->vm_mm;
 	unsigned int flags = vmf->flags;
 
 	if (fault_flag_allow_retry_first(flags)) {
 		/*
-		 * CAUTION! In this case, mmap_lock is not released
-		 * even though return VM_FAULT_RETRY.
+		 * CAUTION! In this case, mmap_lock/per-VMA lock is not
+		 * released even though returning VM_FAULT_RETRY.
 		 */
 		if (flags & FAULT_FLAG_RETRY_NOWAIT)
 			return VM_FAULT_RETRY;
 
-		mmap_read_unlock(mm);
+		release_fault_lock(vmf);
 		if (flags & FAULT_FLAG_KILLABLE)
 			folio_wait_locked_killable(folio);
 		else
@@ -1703,7 +1702,7 @@ vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf)
 
 		ret = __folio_lock_killable(folio);
 		if (ret) {
-			mmap_read_unlock(mm);
+			release_fault_lock(vmf);
 			return VM_FAULT_RETRY;
 		}
 	} else {
diff --git a/mm/memory.c b/mm/memory.c
index 4ae3f046f593..bb0f68a73b0c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3729,12 +3729,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	if (!pte_unmap_same(vmf))
 		goto out;
 
-	if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
-		ret = VM_FAULT_RETRY;
-		vma_end_read(vma);
-		goto out;
-	}
-
 	entry = pte_to_swp_entry(vmf->orig_pte);
 	if (unlikely(non_swap_entry(entry))) {
 		if (is_migration_entry(entry)) {
@@ -3744,6 +3738,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 			vmf->page = pfn_swap_entry_to_page(entry);
 			ret = remove_device_exclusive_entry(vmf);
 		} else if (is_device_private_entry(entry)) {
+			if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
+				/*
+				 * migrate_to_ram is not yet ready to operate
+				 * under VMA lock.
+				 */
+				vma_end_read(vma);
+				ret = VM_FAULT_RETRY;
+				goto out;
+			}
+
 			vmf->page = pfn_swap_entry_to_page(entry);
 			vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
 					vmf->address, &vmf->ptl);
-- 
2.41.0.255.g8b1d071c50-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 6/6] mm: handle userfaults under VMA lock
  2023-06-30  2:04 [PATCH v6 0/6] Per-VMA lock support for swap and userfaults Suren Baghdasaryan
                   ` (4 preceding siblings ...)
  2023-06-30  2:04 ` [PATCH v6 5/6] mm: handle swap page faults under per-VMA lock Suren Baghdasaryan
@ 2023-06-30  2:04 ` Suren Baghdasaryan
  5 siblings, 0 replies; 11+ messages in thread
From: Suren Baghdasaryan @ 2023-06-30  2:04 UTC (permalink / raw)
  To: akpm
  Cc: willy, hannes, mhocko, josef, jack, ldufour, laurent.dufour,
	michel, liam.howlett, jglisse, vbabka, minchan, dave,
	punit.agrawal, lstoakes, hdanton, apopple, peterx, ying.huang,
	david, yuzhao, dhowells, hughd, viro, brauner, pasha.tatashin,
	surenb, linux-mm, linux-fsdevel, linux-kernel, kernel-team

Enable handle_userfault to operate under VMA lock by releasing VMA lock
instead of mmap_lock and retrying. Note that FAULT_FLAG_RETRY_NOWAIT
should never be used when handling faults under per-VMA lock protection
because that would break the assumption that lock is dropped on retry.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
---
 fs/userfaultfd.c   | 34 ++++++++++++++--------------------
 include/linux/mm.h | 24 ++++++++++++++++++++++++
 mm/memory.c        |  9 ---------
 3 files changed, 38 insertions(+), 29 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 7cecd49e078b..21a546eaf9f7 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -277,17 +277,16 @@ static inline struct uffd_msg userfault_msg(unsigned long address,
  * hugepmd ranges.
  */
 static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
-					 struct vm_area_struct *vma,
-					 unsigned long address,
-					 unsigned long flags,
-					 unsigned long reason)
+					      struct vm_fault *vmf,
+					      unsigned long reason)
 {
+	struct vm_area_struct *vma = vmf->vma;
 	pte_t *ptep, pte;
 	bool ret = true;
 
-	mmap_assert_locked(ctx->mm);
+	assert_fault_locked(vmf);
 
-	ptep = hugetlb_walk(vma, address, vma_mmu_pagesize(vma));
+	ptep = hugetlb_walk(vma, vmf->address, vma_mmu_pagesize(vma));
 	if (!ptep)
 		goto out;
 
@@ -308,10 +307,8 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
 }
 #else
 static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
-					 struct vm_area_struct *vma,
-					 unsigned long address,
-					 unsigned long flags,
-					 unsigned long reason)
+					      struct vm_fault *vmf,
+					      unsigned long reason)
 {
 	return false;	/* should never get here */
 }
@@ -325,11 +322,11 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
  * threads.
  */
 static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
-					 unsigned long address,
-					 unsigned long flags,
+					 struct vm_fault *vmf,
 					 unsigned long reason)
 {
 	struct mm_struct *mm = ctx->mm;
+	unsigned long address = vmf->address;
 	pgd_t *pgd;
 	p4d_t *p4d;
 	pud_t *pud;
@@ -338,7 +335,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
 	pte_t ptent;
 	bool ret = true;
 
-	mmap_assert_locked(mm);
+	assert_fault_locked(vmf);
 
 	pgd = pgd_offset(mm, address);
 	if (!pgd_present(*pgd))
@@ -440,7 +437,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 	 * Coredumping runs without mmap_lock so we can only check that
 	 * the mmap_lock is held, if PF_DUMPCORE was not set.
 	 */
-	mmap_assert_locked(mm);
+	assert_fault_locked(vmf);
 
 	ctx = vma->vm_userfaultfd_ctx.ctx;
 	if (!ctx)
@@ -556,15 +553,12 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 	spin_unlock_irq(&ctx->fault_pending_wqh.lock);
 
 	if (!is_vm_hugetlb_page(vma))
-		must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags,
-						  reason);
+		must_wait = userfaultfd_must_wait(ctx, vmf, reason);
 	else
-		must_wait = userfaultfd_huge_must_wait(ctx, vma,
-						       vmf->address,
-						       vmf->flags, reason);
+		must_wait = userfaultfd_huge_must_wait(ctx, vmf, reason);
 	if (is_vm_hugetlb_page(vma))
 		hugetlb_vma_unlock_read(vma);
-	mmap_read_unlock(mm);
+	release_fault_lock(vmf);
 
 	if (likely(must_wait && !READ_ONCE(ctx->released))) {
 		wake_up_poll(&ctx->fd_wqh, EPOLLIN);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 54ab11214f4f..2794225b2d42 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -705,6 +705,17 @@ static inline bool vma_try_start_write(struct vm_area_struct *vma)
 	return true;
 }
 
+static inline void vma_assert_locked(struct vm_area_struct *vma)
+{
+	int mm_lock_seq;
+
+	if (__is_vma_write_locked(vma, &mm_lock_seq))
+		return;
+
+	lockdep_assert_held(&vma->vm_lock->lock);
+	VM_BUG_ON_VMA(!rwsem_is_locked(&vma->vm_lock->lock), vma);
+}
+
 static inline void vma_assert_write_locked(struct vm_area_struct *vma)
 {
 	int mm_lock_seq;
@@ -728,6 +739,14 @@ static inline void release_fault_lock(struct vm_fault *vmf)
 		mmap_read_unlock(vmf->vma->vm_mm);
 }
 
+static inline void assert_fault_locked(struct vm_fault *vmf)
+{
+	if (vmf->flags & FAULT_FLAG_VMA_LOCK)
+		vma_assert_locked(vmf->vma);
+	else
+		mmap_assert_locked(vmf->vma->vm_mm);
+}
+
 struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
 					  unsigned long address);
 
@@ -748,6 +767,11 @@ static inline void release_fault_lock(struct vm_fault *vmf)
 	mmap_read_unlock(vmf->vma->vm_mm);
 }
 
+static inline void assert_fault_locked(struct vm_fault *vmf)
+{
+	mmap_assert_locked(vmf->vma->vm_mm);
+}
+
 #endif /* CONFIG_PER_VMA_LOCK */
 
 /*
diff --git a/mm/memory.c b/mm/memory.c
index bb0f68a73b0c..d9f36f9392a9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5407,15 +5407,6 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
 	if (!vma_start_read(vma))
 		goto inval;
 
-	/*
-	 * Due to the possibility of userfault handler dropping mmap_lock, avoid
-	 * it for now and fall back to page fault handling under mmap_lock.
-	 */
-	if (userfaultfd_armed(vma)) {
-		vma_end_read(vma);
-		goto inval;
-	}
-
 	/* Check since vm_start/vm_end might change before we lock the VMA */
 	if (unlikely(address < vma->vm_start || address >= vma->vm_end)) {
 		vma_end_read(vma);
-- 
2.41.0.255.g8b1d071c50-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 4/6] mm: change folio_lock_or_retry to use vm_fault directly
  2023-06-30  2:04 ` [PATCH v6 4/6] mm: change folio_lock_or_retry to use vm_fault directly Suren Baghdasaryan
@ 2023-06-30  3:36   ` Matthew Wilcox
  2023-06-30  3:45     ` Suren Baghdasaryan
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2023-06-30  3:36 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, hannes, mhocko, josef, jack, ldufour, laurent.dufour,
	michel, liam.howlett, jglisse, vbabka, minchan, dave,
	punit.agrawal, lstoakes, hdanton, apopple, peterx, ying.huang,
	david, yuzhao, dhowells, hughd, viro, brauner, pasha.tatashin,
	linux-mm, linux-fsdevel, linux-kernel, kernel-team

On Thu, Jun 29, 2023 at 07:04:33PM -0700, Suren Baghdasaryan wrote:
> Change folio_lock_or_retry to accept vm_fault struct and return the
> vm_fault_t directly.

I thought we decided to call this folio_lock_fault()?

> +static inline vm_fault_t folio_lock_or_retry(struct folio *folio,
> +					     struct vm_fault *vmf)
>  {
>  	might_sleep();
> -	return folio_trylock(folio) || __folio_lock_or_retry(folio, mm, flags);
> +	return folio_trylock(folio) ? 0 : __folio_lock_or_retry(folio, vmf);

No, don't use the awful ternary operator.  The || form is used
everywhere else.

>  /*
>   * Return values:
> - * true - folio is locked; mmap_lock is still held.
> - * false - folio is not locked.
> + * 0 - folio is locked.
> + * VM_FAULT_RETRY - folio is not locked.

I don't think we want to be so prescriptive here.  It returns non-zero
if the folio is not locked.  The precise value is not something that
callers should depend on.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 4/6] mm: change folio_lock_or_retry to use vm_fault directly
  2023-06-30  3:36   ` Matthew Wilcox
@ 2023-06-30  3:45     ` Suren Baghdasaryan
  2023-06-30  3:47       ` Matthew Wilcox
  0 siblings, 1 reply; 11+ messages in thread
From: Suren Baghdasaryan @ 2023-06-30  3:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: akpm, hannes, mhocko, josef, jack, ldufour, laurent.dufour,
	michel, liam.howlett, jglisse, vbabka, minchan, dave,
	punit.agrawal, lstoakes, hdanton, apopple, peterx, ying.huang,
	david, yuzhao, dhowells, hughd, viro, brauner, pasha.tatashin,
	linux-mm, linux-fsdevel, linux-kernel, kernel-team

On Thu, Jun 29, 2023 at 8:36 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Jun 29, 2023 at 07:04:33PM -0700, Suren Baghdasaryan wrote:
> > Change folio_lock_or_retry to accept vm_fault struct and return the
> > vm_fault_t directly.
>
> I thought we decided to call this folio_lock_fault()?
>
> > +static inline vm_fault_t folio_lock_or_retry(struct folio *folio,
> > +                                          struct vm_fault *vmf)
> >  {
> >       might_sleep();
> > -     return folio_trylock(folio) || __folio_lock_or_retry(folio, mm, flags);
> > +     return folio_trylock(folio) ? 0 : __folio_lock_or_retry(folio, vmf);
>
> No, don't use the awful ternary operator.  The || form is used
> everywhere else.

Ok, but folio_trylock() returns a boolean while folio_lock_or_retry
should return vm_fault_t. How exactly do you suggest changing this?
Something like this perhaps:

static inline vm_fault_t folio_lock_or_retry(struct folio *folio,
                                          struct vm_fault *vmf)
{
     might_sleep();
     if (folio_trylock(folio))
         return 0;
     return __folio_lock_or_retry(folio, mm, flags);
}

?


>
> >  /*
> >   * Return values:
> > - * true - folio is locked; mmap_lock is still held.
> > - * false - folio is not locked.
> > + * 0 - folio is locked.
> > + * VM_FAULT_RETRY - folio is not locked.
>
> I don't think we want to be so prescriptive here.  It returns non-zero
> if the folio is not locked.  The precise value is not something that
> callers should depend on.

Ok, I'll change it to "non-zero" here.

>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 4/6] mm: change folio_lock_or_retry to use vm_fault directly
  2023-06-30  3:45     ` Suren Baghdasaryan
@ 2023-06-30  3:47       ` Matthew Wilcox
  2023-06-30 21:25         ` Suren Baghdasaryan
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2023-06-30  3:47 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, hannes, mhocko, josef, jack, ldufour, laurent.dufour,
	michel, liam.howlett, jglisse, vbabka, minchan, dave,
	punit.agrawal, lstoakes, hdanton, apopple, peterx, ying.huang,
	david, yuzhao, dhowells, hughd, viro, brauner, pasha.tatashin,
	linux-mm, linux-fsdevel, linux-kernel, kernel-team

On Thu, Jun 29, 2023 at 08:45:39PM -0700, Suren Baghdasaryan wrote:
> On Thu, Jun 29, 2023 at 8:36 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Jun 29, 2023 at 07:04:33PM -0700, Suren Baghdasaryan wrote:
> > > Change folio_lock_or_retry to accept vm_fault struct and return the
> > > vm_fault_t directly.
> >
> > I thought we decided to call this folio_lock_fault()?
> >
> > > +static inline vm_fault_t folio_lock_or_retry(struct folio *folio,
> > > +                                          struct vm_fault *vmf)
> > >  {
> > >       might_sleep();
> > > -     return folio_trylock(folio) || __folio_lock_or_retry(folio, mm, flags);
> > > +     return folio_trylock(folio) ? 0 : __folio_lock_or_retry(folio, vmf);
> >
> > No, don't use the awful ternary operator.  The || form is used
> > everywhere else.
> 
> Ok, but folio_trylock() returns a boolean while folio_lock_or_retry
> should return vm_fault_t. How exactly do you suggest changing this?
> Something like this perhaps:
> 
> static inline vm_fault_t folio_lock_or_retry(struct folio *folio,
>                                           struct vm_fault *vmf)
> {
>      might_sleep();
>      if (folio_trylock(folio))
>          return 0;
>      return __folio_lock_or_retry(folio, mm, flags);
> }
> 
> ?

I think the automatic casting would work, but I prefer what you've
written here.

> > >  /*
> > >   * Return values:
> > > - * true - folio is locked; mmap_lock is still held.
> > > - * false - folio is not locked.
> > > + * 0 - folio is locked.
> > > + * VM_FAULT_RETRY - folio is not locked.
> >
> > I don't think we want to be so prescriptive here.  It returns non-zero
> > if the folio is not locked.  The precise value is not something that
> > callers should depend on.
> 
> Ok, I'll change it to "non-zero" here.

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 4/6] mm: change folio_lock_or_retry to use vm_fault directly
  2023-06-30  3:47       ` Matthew Wilcox
@ 2023-06-30 21:25         ` Suren Baghdasaryan
  0 siblings, 0 replies; 11+ messages in thread
From: Suren Baghdasaryan @ 2023-06-30 21:25 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: akpm, hannes, mhocko, josef, jack, ldufour, laurent.dufour,
	michel, liam.howlett, jglisse, vbabka, minchan, dave,
	punit.agrawal, lstoakes, hdanton, apopple, peterx, ying.huang,
	david, yuzhao, dhowells, hughd, viro, brauner, pasha.tatashin,
	linux-mm, linux-fsdevel, linux-kernel, kernel-team

On Thu, Jun 29, 2023 at 8:47 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Jun 29, 2023 at 08:45:39PM -0700, Suren Baghdasaryan wrote:
> > On Thu, Jun 29, 2023 at 8:36 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Thu, Jun 29, 2023 at 07:04:33PM -0700, Suren Baghdasaryan wrote:
> > > > Change folio_lock_or_retry to accept vm_fault struct and return the
> > > > vm_fault_t directly.
> > >
> > > I thought we decided to call this folio_lock_fault()?
> > >
> > > > +static inline vm_fault_t folio_lock_or_retry(struct folio *folio,
> > > > +                                          struct vm_fault *vmf)
> > > >  {
> > > >       might_sleep();
> > > > -     return folio_trylock(folio) || __folio_lock_or_retry(folio, mm, flags);
> > > > +     return folio_trylock(folio) ? 0 : __folio_lock_or_retry(folio, vmf);
> > >
> > > No, don't use the awful ternary operator.  The || form is used
> > > everywhere else.
> >
> > Ok, but folio_trylock() returns a boolean while folio_lock_or_retry
> > should return vm_fault_t. How exactly do you suggest changing this?
> > Something like this perhaps:
> >
> > static inline vm_fault_t folio_lock_or_retry(struct folio *folio,
> >                                           struct vm_fault *vmf)
> > {
> >      might_sleep();
> >      if (folio_trylock(folio))
> >          return 0;
> >      return __folio_lock_or_retry(folio, mm, flags);
> > }
> >
> > ?
>
> I think the automatic casting would work, but I prefer what you've
> written here.
>
> > > >  /*
> > > >   * Return values:
> > > > - * true - folio is locked; mmap_lock is still held.
> > > > - * false - folio is not locked.
> > > > + * 0 - folio is locked.
> > > > + * VM_FAULT_RETRY - folio is not locked.
> > >
> > > I don't think we want to be so prescriptive here.  It returns non-zero
> > > if the folio is not locked.  The precise value is not something that
> > > callers should depend on.
> >
> > Ok, I'll change it to "non-zero" here.
>
> Thanks!

Posted v7 at https://lore.kernel.org/all/20230630211957.1341547-1-surenb@google.com/

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-06-30 21:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-30  2:04 [PATCH v6 0/6] Per-VMA lock support for swap and userfaults Suren Baghdasaryan
2023-06-30  2:04 ` [PATCH v6 1/6] swap: remove remnants of polling from read_swap_cache_async Suren Baghdasaryan
2023-06-30  2:04 ` [PATCH v6 2/6] mm: add missing VM_FAULT_RESULT_TRACE name for VM_FAULT_COMPLETED Suren Baghdasaryan
2023-06-30  2:04 ` [PATCH v6 3/6] mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED Suren Baghdasaryan
2023-06-30  2:04 ` [PATCH v6 4/6] mm: change folio_lock_or_retry to use vm_fault directly Suren Baghdasaryan
2023-06-30  3:36   ` Matthew Wilcox
2023-06-30  3:45     ` Suren Baghdasaryan
2023-06-30  3:47       ` Matthew Wilcox
2023-06-30 21:25         ` Suren Baghdasaryan
2023-06-30  2:04 ` [PATCH v6 5/6] mm: handle swap page faults under per-VMA lock Suren Baghdasaryan
2023-06-30  2:04 ` [PATCH v6 6/6] mm: handle userfaults under VMA lock Suren Baghdasaryan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.