All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/17] Convert GUP to folios
@ 2022-01-02 21:57 Matthew Wilcox (Oracle)
  2022-01-02 21:57 ` [PATCH 01/17] mm: Add folio_put_refs() Matthew Wilcox (Oracle)
                   ` (18 more replies)
  0 siblings, 19 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

This patch series is against my current folio for-next branch.  I know
it won't apply to sfr's next tree, and it's not for-next material yet.
I intend to submit it for 5.18 after I've rebased it to one of the
5.17-rc releases.

The overall effect of this (ignoring the primary "preparing for folios
that are not PAGE or PMD sized" purpose) is to reduce the size of gup.o
by ~700 bytes in the config I normally test with.

This patchset just converts existing implementations to use folios.
There's no new API for consumers here to provide information in a more
efficient (address, length) format.  That will be a separate patchset.

Matthew Wilcox (Oracle) (17):
  mm: Add folio_put_refs()
  mm: Add folio_pincount_available()
  mm: Add folio_pincount_ptr()
  mm: Convert page_maybe_dma_pinned() to use a folio
  gup: Add try_get_folio()
  mm: Remove page_cache_add_speculative() and
    page_cache_get_speculative()
  gup: Add gup_put_folio()
  gup: Add try_grab_folio()
  gup: Convert gup_pte_range() to use a folio
  gup: Convert gup_hugepte() to use a folio
  gup: Convert gup_huge_pmd() to use a folio
  gup: Convert gup_huge_pud() to use a folio
  gup: Convert gup_huge_pgd() to use a folio
  gup: Convert for_each_compound_head() to gup_for_each_folio()
  gup: Convert for_each_compound_range() to gup_for_each_folio_range()
  mm: Add isolate_lru_folio()
  gup: Convert check_and_migrate_movable_pages() to use a folio

 arch/powerpc/include/asm/mmu_context.h |   1 -
 include/linux/mm.h                     |  58 +++--
 include/linux/mm_types.h               |   6 +
 include/linux/pagemap.h                |  11 -
 mm/folio-compat.c                      |   8 +
 mm/gup.c                               | 312 ++++++++++++-------------
 mm/hugetlb.c                           |   7 +-
 mm/internal.h                          |   3 +-
 mm/vmscan.c                            |  43 ++--
 9 files changed, 222 insertions(+), 227 deletions(-)

-- 
2.33.0



^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 01/17] mm: Add folio_put_refs()
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:00   ` Christoph Hellwig
  2022-01-04 21:15   ` John Hubbard
  2022-01-02 21:57 ` [PATCH 02/17] mm: Add folio_pincount_available() Matthew Wilcox (Oracle)
                   ` (17 subsequent siblings)
  18 siblings, 2 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

This is like folio_put(), but puts N references at once instead of
just one.  It's like put_page_refs(), but does one atomic operation
instead of two, and is available to more than just gup.c.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d8b7d7ed14dd..98a10412d581 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1237,6 +1237,26 @@ static inline void folio_put(struct folio *folio)
 		__put_page(&folio->page);
 }
 
+/**
+ * folio_put_refs - Reduce the reference count on a folio.
+ * @folio: The folio.
+ * @refs: The number of references to reduce.
+ *
+ * If the folio's reference count reaches zero, the memory will be
+ * released back to the page allocator and may be used by another
+ * allocation immediately.  Do not access the memory or the struct folio
+ * after calling folio_put_refs() unless you can be sure that these weren't
+ * the last references.
+ *
+ * Context: May be called in process or interrupt context, but not in NMI
+ * context.  May be called while holding a spinlock.
+ */
+static inline void folio_put_refs(struct folio *folio, int refs)
+{
+	if (folio_ref_sub_and_test(folio, refs))
+		__put_page(&folio->page);
+}
+
 static inline void put_page(struct page *page)
 {
 	struct folio *folio = page_folio(page);
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 02/17] mm: Add folio_pincount_available()
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
  2022-01-02 21:57 ` [PATCH 01/17] mm: Add folio_put_refs() Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:01   ` Christoph Hellwig
  2022-01-04 21:40   ` John Hubbard
  2022-01-02 21:57 ` [PATCH 03/17] mm: Add folio_pincount_ptr() Matthew Wilcox (Oracle)
                   ` (16 subsequent siblings)
  18 siblings, 2 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

Convert hpage_pincount_available() into folio_pincount_available() and
turn hpage_pincount_available() into a wrapper.  We don't need to
check folio_test_large() before checking folio_order() as
folio_order() includes a check of folio_test_large().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 98a10412d581..269b5484d66e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -927,15 +927,19 @@ static inline void destroy_compound_page(struct page *page)
 	compound_page_dtors[page[1].compound_dtor](page);
 }
 
-static inline bool hpage_pincount_available(struct page *page)
+static inline bool folio_pincount_available(struct folio *folio)
 {
 	/*
-	 * Can the page->hpage_pinned_refcount field be used? That field is in
+	 * Can the folio->hpage_pinned_refcount field be used? That field is in
 	 * the 3rd page of the compound page, so the smallest (2-page) compound
 	 * pages cannot support it.
 	 */
-	page = compound_head(page);
-	return PageCompound(page) && compound_order(page) > 1;
+	return folio_order(folio) > 1;
+}
+
+static inline bool hpage_pincount_available(struct page *page)
+{
+	return folio_pincount_available(page_folio(page));
 }
 
 static inline int head_compound_pincount(struct page *head)
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 03/17] mm: Add folio_pincount_ptr()
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
  2022-01-02 21:57 ` [PATCH 01/17] mm: Add folio_put_refs() Matthew Wilcox (Oracle)
  2022-01-02 21:57 ` [PATCH 02/17] mm: Add folio_pincount_available() Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:02   ` Christoph Hellwig
                     ` (2 more replies)
  2022-01-02 21:57 ` [PATCH 04/17] mm: Convert page_maybe_dma_pinned() to use a folio Matthew Wilcox (Oracle)
                   ` (15 subsequent siblings)
  18 siblings, 3 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

This is the folio equivalent of compound_pincount_ptr().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm_types.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index c3a6e6209600..09d9e2c4a2c5 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -309,6 +309,12 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page)
 	return &page[1].compound_mapcount;
 }
 
+static inline atomic_t *folio_pincount_ptr(struct folio *folio)
+{
+	struct page *tail = &folio->page + 2;
+	return &tail->hpage_pinned_refcount;
+}
+
 static inline atomic_t *compound_pincount_ptr(struct page *page)
 {
 	return &page[2].hpage_pinned_refcount;
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 04/17] mm: Convert page_maybe_dma_pinned() to use a folio
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 03/17] mm: Add folio_pincount_ptr() Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:03   ` Christoph Hellwig
  2022-01-04 22:01   ` John Hubbard
  2022-01-02 21:57 ` [PATCH 05/17] gup: Add try_get_folio() Matthew Wilcox (Oracle)
                   ` (14 subsequent siblings)
  18 siblings, 2 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

Replaces three calls to compound_head() with one.  This removes the last
user of compound_pincount(), so remove that helper too.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 269b5484d66e..00dcea53bb96 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -947,13 +947,6 @@ static inline int head_compound_pincount(struct page *head)
 	return atomic_read(compound_pincount_ptr(head));
 }
 
-static inline int compound_pincount(struct page *page)
-{
-	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
-	page = compound_head(page);
-	return head_compound_pincount(page);
-}
-
 static inline void set_compound_order(struct page *page, unsigned int order)
 {
 	page[1].compound_order = order;
@@ -1347,18 +1340,20 @@ void unpin_user_pages(struct page **pages, unsigned long npages);
  */
 static inline bool page_maybe_dma_pinned(struct page *page)
 {
-	if (hpage_pincount_available(page))
-		return compound_pincount(page) > 0;
+	struct folio *folio = page_folio(page);
+
+	if (folio_pincount_available(folio))
+		return atomic_read(folio_pincount_ptr(folio)) > 0;
 
 	/*
 	 * page_ref_count() is signed. If that refcount overflows, then
 	 * page_ref_count() returns a negative value, and callers will avoid
 	 * further incrementing the refcount.
 	 *
-	 * Here, for that overflow case, use the signed bit to count a little
+	 * Here, for that overflow case, use the sign bit to count a little
 	 * bit higher via unsigned math, and thus still get an accurate result.
 	 */
-	return ((unsigned int)page_ref_count(compound_head(page))) >=
+	return ((unsigned int)folio_ref_count(folio)) >=
 		GUP_PIN_COUNTING_BIAS;
 }
 
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 05/17] gup: Add try_get_folio()
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 04/17] mm: Convert page_maybe_dma_pinned() to use a folio Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:18   ` Christoph Hellwig
  2022-01-05  1:25   ` John Hubbard
  2022-01-02 21:57 ` [PATCH 06/17] mm: Remove page_cache_add_speculative() and page_cache_get_speculative() Matthew Wilcox (Oracle)
                   ` (13 subsequent siblings)
  18 siblings, 2 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

This replaces try_get_compound_head().  It includes a small optimisation
for the race where a folio is split between being looked up from its
tail page and the reference count being obtained.  Before, it returned
NULL, which presumably triggered a retry under the mmap_lock, whereas
now it will retry without the lock.  Finding a frozen page will still
return NULL.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 69 +++++++++++++++++++++++++++++---------------------------
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 2c51e9748a6a..58e5cfaaa676 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -29,12 +29,11 @@ struct follow_page_context {
 	unsigned int page_mask;
 };
 
-static void hpage_pincount_add(struct page *page, int refs)
+static void folio_pincount_add(struct folio *folio, int refs)
 {
-	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
-	VM_BUG_ON_PAGE(page != compound_head(page), page);
+	VM_BUG_ON_FOLIO(!folio_pincount_available(folio), folio);
 
-	atomic_add(refs, compound_pincount_ptr(page));
+	atomic_add(refs, folio_pincount_ptr(folio));
 }
 
 static void hpage_pincount_sub(struct page *page, int refs)
@@ -63,33 +62,35 @@ static void put_page_refs(struct page *page, int refs)
 }
 
 /*
- * Return the compound head page with ref appropriately incremented,
+ * Return the folio with ref appropriately incremented,
  * or NULL if that failed.
  */
-static inline struct page *try_get_compound_head(struct page *page, int refs)
+static inline struct folio *try_get_folio(struct page *page, int refs)
 {
-	struct page *head = compound_head(page);
+	struct folio *folio;
 
-	if (WARN_ON_ONCE(page_ref_count(head) < 0))
+retry:
+	folio = page_folio(page);
+	if (WARN_ON_ONCE(folio_ref_count(folio) < 0))
 		return NULL;
-	if (unlikely(!page_cache_add_speculative(head, refs)))
+	if (unlikely(!folio_ref_try_add_rcu(folio, refs)))
 		return NULL;
 
 	/*
-	 * At this point we have a stable reference to the head page; but it
-	 * could be that between the compound_head() lookup and the refcount
-	 * increment, the compound page was split, in which case we'd end up
-	 * holding a reference on a page that has nothing to do with the page
+	 * At this point we have a stable reference to the folio; but it
+	 * could be that between calling page_folio() and the refcount
+	 * increment, the folio was split, in which case we'd end up
+	 * holding a reference on a folio that has nothing to do with the page
 	 * we were given anymore.
-	 * So now that the head page is stable, recheck that the pages still
-	 * belong together.
+	 * So now that the folio is stable, recheck that the page still
+	 * belongs to this folio.
 	 */
-	if (unlikely(compound_head(page) != head)) {
-		put_page_refs(head, refs);
-		return NULL;
+	if (unlikely(page_folio(page) != folio)) {
+		folio_put_refs(folio, refs);
+		goto retry;
 	}
 
-	return head;
+	return folio;
 }
 
 /**
@@ -128,8 +129,10 @@ struct page *try_grab_compound_head(struct page *page,
 				    int refs, unsigned int flags)
 {
 	if (flags & FOLL_GET)
-		return try_get_compound_head(page, refs);
+		return &try_get_folio(page, refs)->page;
 	else if (flags & FOLL_PIN) {
+		struct folio *folio;
+
 		/*
 		 * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
 		 * right zone, so fail and let the caller fall back to the slow
@@ -143,29 +146,29 @@ struct page *try_grab_compound_head(struct page *page,
 		 * CAUTION: Don't use compound_head() on the page before this
 		 * point, the result won't be stable.
 		 */
-		page = try_get_compound_head(page, refs);
-		if (!page)
+		folio = try_get_folio(page, refs);
+		if (!folio)
 			return NULL;
 
 		/*
-		 * When pinning a compound page of order > 1 (which is what
+		 * When pinning a folio of order > 1 (which is what
 		 * hpage_pincount_available() checks for), use an exact count to
-		 * track it, via hpage_pincount_add/_sub().
+		 * track it, via folio_pincount_add/_sub().
 		 *
-		 * However, be sure to *also* increment the normal page refcount
-		 * field at least once, so that the page really is pinned.
+		 * However, be sure to *also* increment the normal folio refcount
+		 * field at least once, so that the folio really is pinned.
 		 * That's why the refcount from the earlier
-		 * try_get_compound_head() is left intact.
+		 * try_get_folio() is left intact.
 		 */
-		if (hpage_pincount_available(page))
-			hpage_pincount_add(page, refs);
+		if (folio_pincount_available(folio))
+			folio_pincount_add(folio, refs);
 		else
-			page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
+			folio_ref_add(folio,
+					refs * (GUP_PIN_COUNTING_BIAS - 1));
 
-		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
-				    refs);
+		node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
 
-		return page;
+		return &folio->page;
 	}
 
 	WARN_ON_ONCE(1);
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 06/17] mm: Remove page_cache_add_speculative() and page_cache_get_speculative()
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 05/17] gup: Add try_get_folio() Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:18   ` Christoph Hellwig
  2022-01-05  1:29   ` John Hubbard
  2022-01-02 21:57 ` [PATCH 07/17] gup: Add gup_put_folio() Matthew Wilcox (Oracle)
                   ` (12 subsequent siblings)
  18 siblings, 2 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

These wrappers have no more callers, so delete them.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h      |  7 +++----
 include/linux/pagemap.h | 11 -----------
 2 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00dcea53bb96..602de23482ef 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1298,10 +1298,9 @@ static inline void put_page(struct page *page)
  * applications that don't have huge page reference counts, this won't be an
  * issue.
  *
- * Locking: the lockless algorithm described in page_cache_get_speculative()
- * and page_cache_gup_pin_speculative() provides safe operation for
- * get_user_pages and page_mkclean and other calls that race to set up page
- * table entries.
+ * Locking: the lockless algorithm described in folio_try_get_rcu()
+ * provides safe operation for get_user_pages(), page_mkclean() and
+ * other calls that race to set up page table entries.
  */
 #define GUP_PIN_COUNTING_BIAS (1U << 10)
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 704cb1b4b15d..4a63176b6417 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -283,17 +283,6 @@ static inline struct inode *folio_inode(struct folio *folio)
 	return folio->mapping->host;
 }
 
-static inline bool page_cache_add_speculative(struct page *page, int count)
-{
-	VM_BUG_ON_PAGE(PageTail(page), page);
-	return folio_ref_try_add_rcu((struct folio *)page, count);
-}
-
-static inline bool page_cache_get_speculative(struct page *page)
-{
-	return page_cache_add_speculative(page, 1);
-}
-
 /**
  * folio_attach_private - Attach private data to a folio.
  * @folio: Folio to attach data to.
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 07/17] gup: Add gup_put_folio()
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 06/17] mm: Remove page_cache_add_speculative() and page_cache_get_speculative() Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:22   ` Christoph Hellwig
                     ` (2 more replies)
  2022-01-02 21:57 ` [PATCH 08/17] gup: Add try_grab_folio() Matthew Wilcox (Oracle)
                   ` (11 subsequent siblings)
  18 siblings, 3 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

put_compound_head() is turned into a call to gup_put_folio().
This removes the last call to put_page_refs(), so delete it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 44 +++++++++++++++-----------------------------
 1 file changed, 15 insertions(+), 29 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 58e5cfaaa676..6d827f7d66d8 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -36,29 +36,11 @@ static void folio_pincount_add(struct folio *folio, int refs)
 	atomic_add(refs, folio_pincount_ptr(folio));
 }
 
-static void hpage_pincount_sub(struct page *page, int refs)
+static void folio_pincount_sub(struct folio *folio, int refs)
 {
-	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
-	VM_BUG_ON_PAGE(page != compound_head(page), page);
-
-	atomic_sub(refs, compound_pincount_ptr(page));
-}
-
-/* Equivalent to calling put_page() @refs times. */
-static void put_page_refs(struct page *page, int refs)
-{
-#ifdef CONFIG_DEBUG_VM
-	if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
-		return;
-#endif
+	VM_BUG_ON_FOLIO(!folio_pincount_available(folio), folio);
 
-	/*
-	 * Calling put_page() for each ref is unnecessarily slow. Only the last
-	 * ref needs a put_page().
-	 */
-	if (refs > 1)
-		page_ref_sub(page, refs - 1);
-	put_page(page);
+	atomic_sub(refs, folio_pincount_ptr(folio));
 }
 
 /*
@@ -175,19 +157,23 @@ struct page *try_grab_compound_head(struct page *page,
 	return NULL;
 }
 
-static void put_compound_head(struct page *page, int refs, unsigned int flags)
+static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
 {
 	if (flags & FOLL_PIN) {
-		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED,
-				    refs);
-
-		if (hpage_pincount_available(page))
-			hpage_pincount_sub(page, refs);
+		node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
+		if (folio_pincount_available(folio))
+			folio_pincount_sub(folio, refs);
 		else
 			refs *= GUP_PIN_COUNTING_BIAS;
 	}
 
-	put_page_refs(page, refs);
+	folio_put_refs(folio, refs);
+}
+
+static void put_compound_head(struct page *page, int refs, unsigned int flags)
+{
+	VM_BUG_ON_PAGE(PageTail(page), page);
+	gup_put_folio((struct folio *)page, refs, flags);
 }
 
 /**
@@ -228,7 +214,7 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags)
  */
 void unpin_user_page(struct page *page)
 {
-	put_compound_head(compound_head(page), 1, FOLL_PIN);
+	gup_put_folio(page_folio(page), 1, FOLL_PIN);
 }
 EXPORT_SYMBOL(unpin_user_page);
 
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 08/17] gup: Add try_grab_folio()
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 07/17] gup: Add gup_put_folio() Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:24   ` Christoph Hellwig
  2022-01-05  7:06   ` John Hubbard
  2022-01-02 21:57 ` [PATCH 09/17] gup: Convert gup_pte_range() to use a folio Matthew Wilcox (Oracle)
                   ` (10 subsequent siblings)
  18 siblings, 2 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

try_grab_compound_head() is turned into a call to try_grab_folio().
Convert the two callers who only care about a boolean success/fail.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h |  4 +---
 mm/gup.c           | 25 +++++++++++++------------
 mm/hugetlb.c       |  7 +++----
 3 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 602de23482ef..4e763a590c9c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1202,9 +1202,7 @@ static inline void get_page(struct page *page)
 }
 
 bool __must_check try_grab_page(struct page *page, unsigned int flags);
-struct page *try_grab_compound_head(struct page *page, int refs,
-				    unsigned int flags);
-
+struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags);
 
 static inline __must_check bool try_get_page(struct page *page)
 {
diff --git a/mm/gup.c b/mm/gup.c
index 6d827f7d66d8..2307b2917055 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -76,12 +76,8 @@ static inline struct folio *try_get_folio(struct page *page, int refs)
 }
 
 /**
- * try_grab_compound_head() - attempt to elevate a page's refcount, by a
+ * try_grab_folio() - attempt to elevate a page's refcount, by a
  * flags-dependent amount.
- *
- * Even though the name includes "compound_head", this function is still
- * appropriate for callers that have a non-compound @page to get.
- *
  * @page:  pointer to page to be grabbed
  * @refs:  the value to (effectively) add to the page's refcount
  * @flags: gup flags: these are the FOLL_* flag values.
@@ -102,16 +98,15 @@ static inline struct folio *try_get_folio(struct page *page, int refs)
  *    FOLL_PIN on normal pages, or compound pages that are two pages long:
  *    page's refcount will be incremented by @refs * GUP_PIN_COUNTING_BIAS.
  *
- * Return: head page (with refcount appropriately incremented) for success, or
+ * Return: folio (with refcount appropriately incremented) for success, or
  * NULL upon failure. If neither FOLL_GET nor FOLL_PIN was set, that's
  * considered failure, and furthermore, a likely bug in the caller, so a warning
  * is also emitted.
  */
-struct page *try_grab_compound_head(struct page *page,
-				    int refs, unsigned int flags)
+struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
 {
 	if (flags & FOLL_GET)
-		return &try_get_folio(page, refs)->page;
+		return try_get_folio(page, refs);
 	else if (flags & FOLL_PIN) {
 		struct folio *folio;
 
@@ -150,13 +145,19 @@ struct page *try_grab_compound_head(struct page *page,
 
 		node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
 
-		return &folio->page;
+		return folio;
 	}
 
 	WARN_ON_ONCE(1);
 	return NULL;
 }
 
+static inline struct page *try_grab_compound_head(struct page *page,
+		int refs, unsigned int flags)
+{
+	return &try_grab_folio(page, refs, flags)->page;
+}
+
 static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
 {
 	if (flags & FOLL_PIN) {
@@ -188,7 +189,7 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
  * @flags:   gup flags: these are the FOLL_* flag values.
  *
  * Either FOLL_PIN or FOLL_GET (or neither) may be set, but not both at the same
- * time. Cases: please see the try_grab_compound_head() documentation, with
+ * time. Cases: please see the try_grab_folio() documentation, with
  * "refs=1".
  *
  * Return: true for success, or if no action was required (if neither FOLL_PIN
@@ -200,7 +201,7 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags)
 	if (!(flags & (FOLL_GET | FOLL_PIN)))
 		return true;
 
-	return try_grab_compound_head(page, 1, flags);
+	return try_grab_folio(page, 1, flags);
 }
 
 /**
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index abcd1785c629..ab67b13c4a71 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6072,7 +6072,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 
 		if (pages) {
 			/*
-			 * try_grab_compound_head() should always succeed here,
+			 * try_grab_folio() should always succeed here,
 			 * because: a) we hold the ptl lock, and b) we've just
 			 * checked that the huge page is present in the page
 			 * tables. If the huge page is present, then the tail
@@ -6081,9 +6081,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			 * any way. So this page must be available at this
 			 * point, unless the page refcount overflowed:
 			 */
-			if (WARN_ON_ONCE(!try_grab_compound_head(pages[i],
-								 refs,
-								 flags))) {
+			if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs,
+							 flags))) {
 				spin_unlock(ptl);
 				remainder = 0;
 				err = -ENOMEM;
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 09/17] gup: Convert gup_pte_range() to use a folio
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 08/17] gup: Add try_grab_folio() Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:25   ` Christoph Hellwig
  2022-01-05  7:36   ` John Hubbard
  2022-01-02 21:57 ` [PATCH 10/17] gup: Convert gup_hugepte() " Matthew Wilcox (Oracle)
                   ` (9 subsequent siblings)
  18 siblings, 2 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

We still call try_grab_folio() once per PTE; a future patch could
optimise to just adjust the reference count for each page within
the folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 2307b2917055..d8535f9d5622 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2260,7 +2260,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 	ptem = ptep = pte_offset_map(&pmd, addr);
 	do {
 		pte_t pte = ptep_get_lockless(ptep);
-		struct page *head, *page;
+		struct page *page;
+		struct folio *folio;
 
 		/*
 		 * Similar to the PMD case below, NUMA hinting must take slow
@@ -2287,22 +2288,20 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 		page = pte_page(pte);
 
-		head = try_grab_compound_head(page, 1, flags);
-		if (!head)
+		folio = try_grab_folio(page, 1, flags);
+		if (!folio)
 			goto pte_unmap;
 
 		if (unlikely(page_is_secretmem(page))) {
-			put_compound_head(head, 1, flags);
+			gup_put_folio(folio, 1, flags);
 			goto pte_unmap;
 		}
 
 		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
-			put_compound_head(head, 1, flags);
+			gup_put_folio(folio, 1, flags);
 			goto pte_unmap;
 		}
 
-		VM_BUG_ON_PAGE(compound_head(page) != head, page);
-
 		/*
 		 * We need to make the page accessible if and only if we are
 		 * going to access its content (the FOLL_PIN case).  Please
@@ -2316,10 +2315,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 				goto pte_unmap;
 			}
 		}
-		SetPageReferenced(page);
+		folio_set_referenced(folio);
 		pages[*nr] = page;
 		(*nr)++;
-
 	} while (ptep++, addr += PAGE_SIZE, addr != end);
 
 	ret = 1;
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 10/17] gup: Convert gup_hugepte() to use a folio
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (8 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 09/17] gup: Convert gup_pte_range() to use a folio Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:26   ` Christoph Hellwig
  2022-01-05  7:46   ` John Hubbard
  2022-01-02 21:57 ` [PATCH 11/17] gup: Convert gup_huge_pmd() " Matthew Wilcox (Oracle)
                   ` (8 subsequent siblings)
  18 siblings, 2 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

There should be little to no effect from this patch; just removing
uses of some old APIs.

While I'm looking at this, take the opportunity to use nth_page()
instead of doing the arithmetic ourselves in case hugetlbfs pages
are ever allocated across memmap boundaries.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index d8535f9d5622..1c7fb668b46d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2435,7 +2435,7 @@ static int record_subpages(struct page *page, unsigned long addr,
 	int nr;
 
 	for (nr = 0; addr != end; addr += PAGE_SIZE)
-		pages[nr++] = page++;
+		pages[nr++] = nth_page(page, nr);
 
 	return nr;
 }
@@ -2453,7 +2453,8 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
 		       struct page **pages, int *nr)
 {
 	unsigned long pte_end;
-	struct page *head, *page;
+	struct page *page;
+	struct folio *folio;
 	pte_t pte;
 	int refs;
 
@@ -2469,21 +2470,20 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
 	/* hugepages are never "special" */
 	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 
-	head = pte_page(pte);
-	page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
+	page = nth_page(pte_page(pte), (addr & (sz - 1)) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
-	head = try_grab_compound_head(head, refs, flags);
-	if (!head)
+	folio = try_grab_folio(page, refs, flags);
+	if (!folio)
 		return 0;
 
 	if (unlikely(pte_val(pte) != pte_val(*ptep))) {
-		put_compound_head(head, refs, flags);
+		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
 
 	*nr += refs;
-	SetPageReferenced(head);
+	folio_set_referenced(folio);
 	return 1;
 }
 
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 11/17] gup: Convert gup_huge_pmd() to use a folio
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (9 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 10/17] gup: Convert gup_hugepte() " Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:26   ` Christoph Hellwig
  2022-01-05  7:50   ` John Hubbard
  2022-01-02 21:57 ` [PATCH 12/17] gup: Convert gup_huge_pud() " Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  18 siblings, 2 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

Use the new folio-based APIs.  Also fix an assumption that memmap is
contiguous.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 1c7fb668b46d..be965c965484 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2517,7 +2517,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 			unsigned long end, unsigned int flags,
 			struct page **pages, int *nr)
 {
-	struct page *head, *page;
+	struct page *page;
+	struct folio *folio;
 	int refs;
 
 	if (!pmd_access_permitted(orig, flags & FOLL_WRITE))
@@ -2530,20 +2531,20 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 					     pages, nr);
 	}
 
-	page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
-	head = try_grab_compound_head(pmd_page(orig), refs, flags);
-	if (!head)
+	folio = try_grab_folio(page, refs, flags);
+	if (!folio)
 		return 0;
 
 	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
-		put_compound_head(head, refs, flags);
+		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
 
 	*nr += refs;
-	SetPageReferenced(head);
+	folio_set_referenced(folio);
 	return 1;
 }
 
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 12/17] gup: Convert gup_huge_pud() to use a folio
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (10 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 11/17] gup: Convert gup_huge_pmd() " Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-05  7:58   ` John Hubbard
  2022-01-02 21:57 ` [PATCH 13/17] gup: Convert gup_huge_pgd() " Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

Use the new folio-based APIs.  Also fix an assumption that memmap is
contiguous.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index be965c965484..e7bcee8776e1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2552,7 +2552,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 			unsigned long end, unsigned int flags,
 			struct page **pages, int *nr)
 {
-	struct page *head, *page;
+	struct page *page;
+	struct folio *folio;
 	int refs;
 
 	if (!pud_access_permitted(orig, flags & FOLL_WRITE))
@@ -2565,20 +2566,20 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 					     pages, nr);
 	}
 
-	page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
-	head = try_grab_compound_head(pud_page(orig), refs, flags);
-	if (!head)
+	folio = try_grab_folio(page, refs, flags);
+	if (!folio)
 		return 0;
 
 	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
-		put_compound_head(head, refs, flags);
+		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
 
 	*nr += refs;
-	SetPageReferenced(head);
+	folio_set_referenced(folio);
 	return 1;
 }
 
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 13/17] gup: Convert gup_huge_pgd() to use a folio
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (11 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 12/17] gup: Convert gup_huge_pud() " Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-05  7:58   ` John Hubbard
  2022-01-02 21:57 ` [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio() Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

Use the new folio-based APIs.  Also fix an assumption that memmap is
contiguous.  This was the last user of try_grab_compound_head(),
so remove it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index e7bcee8776e1..7bd1e4a2648a 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -152,12 +152,6 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
 	return NULL;
 }
 
-static inline struct page *try_grab_compound_head(struct page *page,
-		int refs, unsigned int flags)
-{
-	return &try_grab_folio(page, refs, flags)->page;
-}
-
 static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
 {
 	if (flags & FOLL_PIN) {
@@ -2588,27 +2582,28 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 			struct page **pages, int *nr)
 {
 	int refs;
-	struct page *head, *page;
+	struct page *page;
+	struct folio *folio;
 
 	if (!pgd_access_permitted(orig, flags & FOLL_WRITE))
 		return 0;
 
 	BUILD_BUG_ON(pgd_devmap(orig));
 
-	page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
+	page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
-	head = try_grab_compound_head(pgd_page(orig), refs, flags);
-	if (!head)
+	folio = try_grab_folio(page, refs, flags);
+	if (!folio)
 		return 0;
 
 	if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) {
-		put_compound_head(head, refs, flags);
+		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
 
 	*nr += refs;
-	SetPageReferenced(head);
+	folio_set_referenced(folio);
 	return 1;
 }
 
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio()
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (12 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 13/17] gup: Convert gup_huge_pgd() " Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:32   ` Christoph Hellwig
  2022-01-05  8:17   ` John Hubbard
  2022-01-02 21:57 ` [PATCH 15/17] gup: Convert for_each_compound_range() to gup_for_each_folio_range() Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  18 siblings, 2 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

This macro can be considerably simplified by returning the folio from
gup_folio_next() instead of void from compound_next().  Convert both
callers to work on folios instead of pages.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 47 ++++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 7bd1e4a2648a..eaffa6807609 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -239,31 +239,29 @@ static inline void compound_range_next(unsigned long i, unsigned long npages,
 	     __i < __npages; __i += __ntails, \
 	     compound_range_next(__i, __npages, __list, &(__head), &(__ntails)))
 
-static inline void compound_next(unsigned long i, unsigned long npages,
-				 struct page **list, struct page **head,
-				 unsigned int *ntails)
+static inline struct folio *gup_folio_next(unsigned long i,
+		unsigned long npages, struct page **list, unsigned int *ntails)
 {
-	struct page *page;
+	struct folio *folio;
 	unsigned int nr;
 
 	if (i >= npages)
-		return;
+		return NULL;
 
-	page = compound_head(list[i]);
+	folio = page_folio(list[i]);
 	for (nr = i + 1; nr < npages; nr++) {
-		if (compound_head(list[nr]) != page)
+		if (page_folio(list[nr]) != folio)
 			break;
 	}
 
-	*head = page;
 	*ntails = nr - i;
+	return folio;
 }
 
-#define for_each_compound_head(__i, __list, __npages, __head, __ntails) \
-	for (__i = 0, \
-	     compound_next(__i, __npages, __list, &(__head), &(__ntails)); \
-	     __i < __npages; __i += __ntails, \
-	     compound_next(__i, __npages, __list, &(__head), &(__ntails)))
+#define gup_for_each_folio(__i, __list, __npages, __folio, __ntails) \
+	for (__i = 0; \
+	     (__folio = gup_folio_next(__i, __npages, __list, &(__ntails))) != NULL; \
+	     __i += __ntails)
 
 /**
  * unpin_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
@@ -291,15 +289,15 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
 				 bool make_dirty)
 {
 	unsigned long index;
-	struct page *head;
-	unsigned int ntails;
+	struct folio *folio;
+	unsigned int nr;
 
 	if (!make_dirty) {
 		unpin_user_pages(pages, npages);
 		return;
 	}
 
-	for_each_compound_head(index, pages, npages, head, ntails) {
+	gup_for_each_folio(index, pages, npages, folio, nr) {
 		/*
 		 * Checking PageDirty at this point may race with
 		 * clear_page_dirty_for_io(), but that's OK. Two key
@@ -320,9 +318,12 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
 		 * written back, so it gets written back again in the
 		 * next writeback cycle. This is harmless.
 		 */
-		if (!PageDirty(head))
-			set_page_dirty_lock(head);
-		put_compound_head(head, ntails, FOLL_PIN);
+		if (!folio_test_dirty(folio)) {
+			folio_lock(folio);
+			folio_mark_dirty(folio);
+			folio_unlock(folio);
+		}
+		gup_put_folio(folio, nr, FOLL_PIN);
 	}
 }
 EXPORT_SYMBOL(unpin_user_pages_dirty_lock);
@@ -375,8 +376,8 @@ EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
 void unpin_user_pages(struct page **pages, unsigned long npages)
 {
 	unsigned long index;
-	struct page *head;
-	unsigned int ntails;
+	struct folio *folio;
+	unsigned int nr;
 
 	/*
 	 * If this WARN_ON() fires, then the system *might* be leaking pages (by
@@ -386,8 +387,8 @@ void unpin_user_pages(struct page **pages, unsigned long npages)
 	if (WARN_ON(IS_ERR_VALUE(npages)))
 		return;
 
-	for_each_compound_head(index, pages, npages, head, ntails)
-		put_compound_head(head, ntails, FOLL_PIN);
+	gup_for_each_folio(index, pages, npages, folio, nr)
+		gup_put_folio(folio, nr, FOLL_PIN);
 }
 EXPORT_SYMBOL(unpin_user_pages);
 
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 15/17] gup: Convert for_each_compound_range() to gup_for_each_folio_range()
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (13 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio() Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:35   ` Christoph Hellwig
  2022-01-05  8:30   ` John Hubbard
  2022-01-02 21:57 ` [PATCH 16/17] mm: Add isolate_lru_folio() Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  18 siblings, 2 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

This macro can be considerably simplified by returning the folio from
gup_folio_range_next() instead of void from compound_next().  Convert the
only caller to work on folios instead of pages.

This removes the last caller of put_compound_head(), so delete it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 50 +++++++++++++++++++++++---------------------------
 1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index eaffa6807609..76717e05413d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -165,12 +165,6 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
 	folio_put_refs(folio, refs);
 }
 
-static void put_compound_head(struct page *page, int refs, unsigned int flags)
-{
-	VM_BUG_ON_PAGE(PageTail(page), page);
-	gup_put_folio((struct folio *)page, refs, flags);
-}
-
 /**
  * try_grab_page() - elevate a page's refcount by a flag-dependent amount
  *
@@ -213,31 +207,30 @@ void unpin_user_page(struct page *page)
 }
 EXPORT_SYMBOL(unpin_user_page);
 
-static inline void compound_range_next(unsigned long i, unsigned long npages,
-				       struct page **list, struct page **head,
-				       unsigned int *ntails)
+static inline struct folio *gup_folio_range_next(unsigned long i,
+		unsigned long npages, struct page **list, unsigned int *ntails)
 {
-	struct page *next, *page;
+	struct page *next;
+	struct folio *folio;
 	unsigned int nr = 1;
 
 	if (i >= npages)
-		return;
+		return NULL;
 
 	next = *list + i;
-	page = compound_head(next);
-	if (PageCompound(page) && compound_order(page) >= 1)
-		nr = min_t(unsigned int,
-			   page + compound_nr(page) - next, npages - i);
+	folio = page_folio(next);
+	if (folio_test_large(folio))
+		nr = min_t(unsigned int, npages - i,
+			   &folio->page + folio_nr_pages(folio) - next);
 
-	*head = page;
 	*ntails = nr;
+	return folio;
 }
 
-#define for_each_compound_range(__i, __list, __npages, __head, __ntails) \
-	for (__i = 0, \
-	     compound_range_next(__i, __npages, __list, &(__head), &(__ntails)); \
-	     __i < __npages; __i += __ntails, \
-	     compound_range_next(__i, __npages, __list, &(__head), &(__ntails)))
+#define gup_for_each_folio_range(__i, __list, __npages, __folio, __ntails) \
+	for (__i = 0; \
+	     (__folio = gup_folio_range_next(__i, __npages, __list, &(__ntails))) != NULL; \
+	     __i += __ntails)
 
 static inline struct folio *gup_folio_next(unsigned long i,
 		unsigned long npages, struct page **list, unsigned int *ntails)
@@ -353,13 +346,16 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
 				      bool make_dirty)
 {
 	unsigned long index;
-	struct page *head;
-	unsigned int ntails;
+	struct folio *folio;
+	unsigned int nr;
 
-	for_each_compound_range(index, &page, npages, head, ntails) {
-		if (make_dirty && !PageDirty(head))
-			set_page_dirty_lock(head);
-		put_compound_head(head, ntails, FOLL_PIN);
+	gup_for_each_folio_range(index, &page, npages, folio, nr) {
+		if (make_dirty && !folio_test_dirty(folio)) {
+			folio_lock(folio);
+			folio_mark_dirty(folio);
+			folio_unlock(folio);
+		}
+		gup_put_folio(folio, nr, FOLL_PIN);
 	}
 }
 EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 16/17] mm: Add isolate_lru_folio()
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (14 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 15/17] gup: Convert for_each_compound_range() to gup_for_each_folio_range() Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:36   ` Christoph Hellwig
  2022-01-05  8:44   ` John Hubbard
  2022-01-02 21:57 ` [PATCH 17/17] gup: Convert check_and_migrate_movable_pages() to use a folio Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  18 siblings, 2 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

Turn isolate_lru_page() into a wrapper around isolate_lru_folio().
TestClearPageLRU() would have always failed on a tail page, so
returning -EBUSY is the same behaviour.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 arch/powerpc/include/asm/mmu_context.h |  1 -
 mm/folio-compat.c                      |  8 +++++
 mm/internal.h                          |  3 +-
 mm/vmscan.c                            | 43 ++++++++++++--------------
 4 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 9ba6b585337f..b9cab0a11421 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -21,7 +21,6 @@ extern void destroy_context(struct mm_struct *mm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 struct mm_iommu_table_group_mem_t;
 
-extern int isolate_lru_page(struct page *page);	/* from internal.h */
 extern bool mm_iommu_preregistered(struct mm_struct *mm);
 extern long mm_iommu_new(struct mm_struct *mm,
 		unsigned long ua, unsigned long entries,
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 749555a232a8..782e766cd1ee 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -7,6 +7,7 @@
 #include <linux/migrate.h>
 #include <linux/pagemap.h>
 #include <linux/swap.h>
+#include "internal.h"
 
 struct address_space *page_mapping(struct page *page)
 {
@@ -151,3 +152,10 @@ int try_to_release_page(struct page *page, gfp_t gfp)
 	return filemap_release_folio(page_folio(page), gfp);
 }
 EXPORT_SYMBOL(try_to_release_page);
+
+int isolate_lru_page(struct page *page)
+{
+	if (WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"))
+		return -EBUSY;
+	return isolate_lru_folio((struct folio *)page);
+}
diff --git a/mm/internal.h b/mm/internal.h
index e989d8ceec91..977d5116d327 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -178,7 +178,8 @@ extern unsigned long highest_memmap_pfn;
 /*
  * in mm/vmscan.c:
  */
-extern int isolate_lru_page(struct page *page);
+int isolate_lru_page(struct page *page);
+int isolate_lru_folio(struct folio *folio);
 extern void putback_lru_page(struct page *page);
 extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fb9584641ac7..ac2f5b76cdb2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2168,45 +2168,40 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 }
 
 /**
- * isolate_lru_page - tries to isolate a page from its LRU list
- * @page: page to isolate from its LRU list
+ * isolate_lru_folio - Try to isolate a folio from its LRU list.
+ * @folio: Folio to isolate from its LRU list.
  *
- * Isolates a @page from an LRU list, clears PageLRU and adjusts the
- * vmstat statistic corresponding to whatever LRU list the page was on.
+ * Isolate a @folio from an LRU list and adjust the vmstat statistic
+ * corresponding to whatever LRU list the folio was on.
  *
- * Returns 0 if the page was removed from an LRU list.
- * Returns -EBUSY if the page was not on an LRU list.
- *
- * The returned page will have PageLRU() cleared.  If it was found on
- * the active list, it will have PageActive set.  If it was found on
- * the unevictable list, it will have the PageUnevictable bit set. That flag
+ * The folio will have its LRU flag cleared.  If it was found on the
+ * active list, it will have the Active flag set.  If it was found on the
+ * unevictable list, it will have the Unevictable flag set.  These flags
  * may need to be cleared by the caller before letting the page go.
  *
- * The vmstat statistic corresponding to the list on which the page was
- * found will be decremented.
- *
- * Restrictions:
+ * Context:
  *
  * (1) Must be called with an elevated refcount on the page. This is a
- *     fundamental difference from isolate_lru_pages (which is called
+ *     fundamental difference from isolate_lru_pages() (which is called
  *     without a stable reference).
- * (2) the lru_lock must not be held.
- * (3) interrupts must be enabled.
+ * (2) The lru_lock must not be held.
+ * (3) Interrupts must be enabled.
+ *
+ * Return: 0 if the folio was removed from an LRU list.
+ * -EBUSY if the folio was not on an LRU list.
  */
-int isolate_lru_page(struct page *page)
+int isolate_lru_folio(struct folio *folio)
 {
-	struct folio *folio = page_folio(page);
 	int ret = -EBUSY;
 
-	VM_BUG_ON_PAGE(!page_count(page), page);
-	WARN_RATELIMIT(PageTail(page), "trying to isolate tail page");
+	VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio);
 
-	if (TestClearPageLRU(page)) {
+	if (folio_test_clear_lru(folio)) {
 		struct lruvec *lruvec;
 
-		get_page(page);
+		folio_get(folio);
 		lruvec = folio_lruvec_lock_irq(folio);
-		del_page_from_lru_list(page, lruvec);
+		lruvec_del_folio(lruvec, folio);
 		unlock_page_lruvec_irq(lruvec);
 		ret = 0;
 	}
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 17/17] gup: Convert check_and_migrate_movable_pages() to use a folio
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (15 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 16/17] mm: Add isolate_lru_folio() Matthew Wilcox (Oracle)
@ 2022-01-02 21:57 ` Matthew Wilcox (Oracle)
  2022-01-04  8:37   ` Christoph Hellwig
  2022-01-05  9:00   ` John Hubbard
  2022-01-06 22:12 ` [PATCH 00/17] Convert GUP to folios William Kucharski
  2022-01-07 18:54 ` Jason Gunthorpe
  18 siblings, 2 replies; 72+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-02 21:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), John Hubbard, Andrew Morton

Switch from head pages to folios.  This removes an assumption that
THPs are the only way to have a high-order page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 76717e05413d..eb7c66e2b785 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1822,41 +1822,41 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
 	bool drain_allow = true;
 	LIST_HEAD(movable_page_list);
 	long ret = 0;
-	struct page *prev_head = NULL;
-	struct page *head;
+	struct folio *folio, *prev_folio = NULL;
 	struct migration_target_control mtc = {
 		.nid = NUMA_NO_NODE,
 		.gfp_mask = GFP_USER | __GFP_NOWARN,
 	};
 
 	for (i = 0; i < nr_pages; i++) {
-		head = compound_head(pages[i]);
-		if (head == prev_head)
+		folio = page_folio(pages[i]);
+		if (folio == prev_folio)
 			continue;
-		prev_head = head;
+		prev_folio = folio;
 		/*
 		 * If we get a movable page, since we are going to be pinning
 		 * these entries, try to move them out if possible.
 		 */
-		if (!is_pinnable_page(head)) {
-			if (PageHuge(head)) {
-				if (!isolate_huge_page(head, &movable_page_list))
+		if (!is_pinnable_page(&folio->page)) {
+			if (folio_test_hugetlb(folio)) {
+				if (!isolate_huge_page(&folio->page,
+							&movable_page_list))
 					isolation_error_count++;
 			} else {
-				if (!PageLRU(head) && drain_allow) {
+				if (!folio_test_lru(folio) && drain_allow) {
 					lru_add_drain_all();
 					drain_allow = false;
 				}
 
-				if (isolate_lru_page(head)) {
+				if (isolate_lru_folio(folio)) {
 					isolation_error_count++;
 					continue;
 				}
-				list_add_tail(&head->lru, &movable_page_list);
-				mod_node_page_state(page_pgdat(head),
+				list_add_tail(&folio->lru, &movable_page_list);
+				node_stat_mod_folio(folio,
 						    NR_ISOLATED_ANON +
-						    page_is_file_lru(head),
-						    thp_nr_pages(head));
+						    folio_is_file_lru(folio),
+						    folio_nr_pages(folio));
 			}
 		}
 	}
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 01/17] mm: Add folio_put_refs()
  2022-01-02 21:57 ` [PATCH 01/17] mm: Add folio_put_refs() Matthew Wilcox (Oracle)
@ 2022-01-04  8:00   ` Christoph Hellwig
  2022-01-04 21:15   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:00 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

On Sun, Jan 02, 2022 at 09:57:13PM +0000, Matthew Wilcox (Oracle) wrote:
> This is like folio_put(), but puts N references at once instead of
> just one.  It's like put_page_refs(), but does one atomic operation
> instead of two, and is available to more than just gup.c.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 02/17] mm: Add folio_pincount_available()
  2022-01-02 21:57 ` [PATCH 02/17] mm: Add folio_pincount_available() Matthew Wilcox (Oracle)
@ 2022-01-04  8:01   ` Christoph Hellwig
  2022-01-04 18:25     ` Matthew Wilcox
  2022-01-04 21:40   ` John Hubbard
  1 sibling, 1 reply; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:01 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

On Sun, Jan 02, 2022 at 09:57:14PM +0000, Matthew Wilcox (Oracle) wrote:
> -static inline bool hpage_pincount_available(struct page *page)
> +static inline bool folio_pincount_available(struct folio *folio)
>  {
>  	/*
> -	 * Can the page->hpage_pinned_refcount field be used? That field is in
> +	 * Can the folio->hpage_pinned_refcount field be used? That field is in
>  	 * the 3rd page of the compound page, so the smallest (2-page) compound
>  	 * pages cannot support it.
>  	 */
> +	return folio_order(folio) > 1;
> +}

Maybe convert the comment into a kerneldoc one?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 03/17] mm: Add folio_pincount_ptr()
  2022-01-02 21:57 ` [PATCH 03/17] mm: Add folio_pincount_ptr() Matthew Wilcox (Oracle)
@ 2022-01-04  8:02   ` Christoph Hellwig
  2022-01-04 21:43   ` John Hubbard
  2022-01-06 21:57   ` William Kucharski
  2 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:02 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

On Sun, Jan 02, 2022 at 09:57:15PM +0000, Matthew Wilcox (Oracle) wrote:
> This is the folio equivalent of compound_pincount_ptr().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/mm_types.h | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index c3a6e6209600..09d9e2c4a2c5 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -309,6 +309,12 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page)
>  	return &page[1].compound_mapcount;
>  }
>  
> +static inline atomic_t *folio_pincount_ptr(struct folio *folio)
> +{
> +	struct page *tail = &folio->page + 2;
> +	return &tail->hpage_pinned_refcount;

Missing empty line?  Except for that and the fact that I dislike the
pre-existing magic numbers this looks fine.

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 04/17] mm: Convert page_maybe_dma_pinned() to use a folio
  2022-01-02 21:57 ` [PATCH 04/17] mm: Convert page_maybe_dma_pinned() to use a folio Matthew Wilcox (Oracle)
@ 2022-01-04  8:03   ` Christoph Hellwig
  2022-01-04 22:01   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:03 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

On Sun, Jan 02, 2022 at 09:57:16PM +0000, Matthew Wilcox (Oracle) wrote:
> Replaces three calls to compound_head() with one.  This removes the last
> user of compound_pincount(), so remove that helper too.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 05/17] gup: Add try_get_folio()
  2022-01-02 21:57 ` [PATCH 05/17] gup: Add try_get_folio() Matthew Wilcox (Oracle)
@ 2022-01-04  8:18   ` Christoph Hellwig
  2022-01-05  1:25   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:18 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

On Sun, Jan 02, 2022 at 09:57:17PM +0000, Matthew Wilcox (Oracle) wrote:
> This replaces try_get_compound_head().  It includes a small optimisation
> for the race where a folio is split between being looked up from its
> tail page and the reference count being obtained.  Before, it returned
> NULL, which presumably triggered a retry under the mmap_lock, whereas
> now it will retry without the lock.  Finding a frozen page will still
> return NULL.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Although I think splitting out the rety optimization into a separate
patch would be nicer to document the change and to ease bisection if
that arises.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 06/17] mm: Remove page_cache_add_speculative() and page_cache_get_speculative()
  2022-01-02 21:57 ` [PATCH 06/17] mm: Remove page_cache_add_speculative() and page_cache_get_speculative() Matthew Wilcox (Oracle)
@ 2022-01-04  8:18   ` Christoph Hellwig
  2022-01-05  1:29   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:18 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 07/17] gup: Add gup_put_folio()
  2022-01-02 21:57 ` [PATCH 07/17] gup: Add gup_put_folio() Matthew Wilcox (Oracle)
@ 2022-01-04  8:22   ` Christoph Hellwig
  2022-01-05  6:52   ` John Hubbard
  2022-01-06 22:05   ` William Kucharski
  2 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:22 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

On Sun, Jan 02, 2022 at 09:57:19PM +0000, Matthew Wilcox (Oracle) wrote:
> put_compound_head() is turned into a call to gup_put_folio().
> This removes the last call to put_page_refs(), so delete it.

Except for one more of those nasty page to folio casts (which I assume
will go away soon), this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 08/17] gup: Add try_grab_folio()
  2022-01-02 21:57 ` [PATCH 08/17] gup: Add try_grab_folio() Matthew Wilcox (Oracle)
@ 2022-01-04  8:24   ` Christoph Hellwig
  2022-01-05  7:06   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:24 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

On Sun, Jan 02, 2022 at 09:57:20PM +0000, Matthew Wilcox (Oracle) wrote:
>  /**
> - * try_grab_compound_head() - attempt to elevate a page's refcount, by a
> + * try_grab_folio() - attempt to elevate a page's refcount, by a

s/page/folio/ ?

> - *
>   * @page:  pointer to page to be grabbed

and here something about page inside a folio?

Otherwise this looks fine, but I wonder if it would make more sense
to already introduce try_grab_folio earlier when you convert
try_grab_compound_head to use folios internally.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 09/17] gup: Convert gup_pte_range() to use a folio
  2022-01-02 21:57 ` [PATCH 09/17] gup: Convert gup_pte_range() to use a folio Matthew Wilcox (Oracle)
@ 2022-01-04  8:25   ` Christoph Hellwig
  2022-01-05  7:36   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:25 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

On Sun, Jan 02, 2022 at 09:57:21PM +0000, Matthew Wilcox (Oracle) wrote:
> We still call try_grab_folio() once per PTE; a future patch could
> optimise to just adjust the reference count for each page within
> the folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 10/17] gup: Convert gup_hugepte() to use a folio
  2022-01-02 21:57 ` [PATCH 10/17] gup: Convert gup_hugepte() " Matthew Wilcox (Oracle)
@ 2022-01-04  8:26   ` Christoph Hellwig
  2022-01-05  7:46   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:26 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

On Sun, Jan 02, 2022 at 09:57:22PM +0000, Matthew Wilcox (Oracle) wrote:
> There should be little to no effect from this patch; just removing
> uses of some old APIs.
> 
> While I'm looking at this, take the opportunity to use nth_page()
> instead of doing the arithmetic ourselves in case hugetlbfs pages
> are ever allocated across memmap boundaries.

I'd split this into two patches, but otherwise this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 11/17] gup: Convert gup_huge_pmd() to use a folio
  2022-01-02 21:57 ` [PATCH 11/17] gup: Convert gup_huge_pmd() " Matthew Wilcox (Oracle)
@ 2022-01-04  8:26   ` Christoph Hellwig
  2022-01-05  7:50   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:26 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

On Sun, Jan 02, 2022 at 09:57:23PM +0000, Matthew Wilcox (Oracle) wrote:
> Use the new folio-based APIs.  Also fix an assumption that memmap is
> contiguous.

The changes looks good, but I'd really split all these nth_page
conversions into a single prep patch.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio()
  2022-01-02 21:57 ` [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio() Matthew Wilcox (Oracle)
@ 2022-01-04  8:32   ` Christoph Hellwig
  2022-01-05  8:17   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:32 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

On Sun, Jan 02, 2022 at 09:57:26PM +0000, Matthew Wilcox (Oracle) wrote:
> This macro can be considerably simplified by returning the folio from
> gup_folio_next() instead of void from compound_next().  Convert both
> callers to work on folios instead of pages.

This looks sensible, but looking at the macro I wonder if an open
coded while loop (using your new calling conventions) wouldn't make
formore readable code than the macro:

	int i = 0;

	...

	while ((folio = gup_folio_next(i, npages, list, &ntail) != NULL)) {
		...
		i += ntails;
	}



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 15/17] gup: Convert for_each_compound_range() to gup_for_each_folio_range()
  2022-01-02 21:57 ` [PATCH 15/17] gup: Convert for_each_compound_range() to gup_for_each_folio_range() Matthew Wilcox (Oracle)
@ 2022-01-04  8:35   ` Christoph Hellwig
  2022-01-05  8:30   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:35 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

On Sun, Jan 02, 2022 at 09:57:27PM +0000, Matthew Wilcox (Oracle) wrote:
> This macro can be considerably simplified by returning the folio from
> gup_folio_range_next() instead of void from compound_next().  Convert the
> only caller to work on folios instead of pages.
> 
> This removes the last caller of put_compound_head(), so delete it.

Looks good, but same comment about dropping the macro.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 16/17] mm: Add isolate_lru_folio()
  2022-01-02 21:57 ` [PATCH 16/17] mm: Add isolate_lru_folio() Matthew Wilcox (Oracle)
@ 2022-01-04  8:36   ` Christoph Hellwig
  2022-01-05  8:44   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:36 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 17/17] gup: Convert check_and_migrate_movable_pages() to use a folio
  2022-01-02 21:57 ` [PATCH 17/17] gup: Convert check_and_migrate_movable_pages() to use a folio Matthew Wilcox (Oracle)
@ 2022-01-04  8:37   ` Christoph Hellwig
  2022-01-05  9:00   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2022-01-04  8:37 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 02/17] mm: Add folio_pincount_available()
  2022-01-04  8:01   ` Christoph Hellwig
@ 2022-01-04 18:25     ` Matthew Wilcox
  0 siblings, 0 replies; 72+ messages in thread
From: Matthew Wilcox @ 2022-01-04 18:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-mm, John Hubbard, Andrew Morton

On Tue, Jan 04, 2022 at 12:01:45AM -0800, Christoph Hellwig wrote:
> On Sun, Jan 02, 2022 at 09:57:14PM +0000, Matthew Wilcox (Oracle) wrote:
> > -static inline bool hpage_pincount_available(struct page *page)
> > +static inline bool folio_pincount_available(struct folio *folio)
> >  {
> >  	/*
> > -	 * Can the page->hpage_pinned_refcount field be used? That field is in
> > +	 * Can the folio->hpage_pinned_refcount field be used? That field is in
> >  	 * the 3rd page of the compound page, so the smallest (2-page) compound
> >  	 * pages cannot support it.
> >  	 */
> > +	return folio_order(folio) > 1;
> > +}
> 
> Maybe convert the comment into a kerneldoc one?

I'm actually thinking about getting rid of it by moving the
hpage_pinned_refcount into page[1]:

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4e763a590c9c..07fa8af564ed 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -950,7 +950,9 @@ static inline int head_compound_pincount(struct page *head)
 static inline void set_compound_order(struct page *page, unsigned int order)
 {
 	page[1].compound_order = order;
+#ifdef CONFIG_64BIT
 	page[1].compound_nr = 1U << order;
+#endif
 }
 
 /* Returns the number of pages in this potentially compound page. */
@@ -958,7 +960,11 @@ static inline unsigned long compound_nr(struct page *page)
 {
 	if (!PageHead(page))
 		return 1;
+#ifdef CONFIG_64BIT
 	return page[1].compound_nr;
+#else
+	return 1UL << compound_order(page);
+#endif
 }
 
 /* Returns the number of bytes in this potentially compound page. */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 09d9e2c4a2c5..e19af4a90a6c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -150,11 +150,14 @@ struct page {
 			unsigned char compound_dtor;
 			unsigned char compound_order;
 			atomic_t compound_mapcount;
+			atomic_t hpage_pinned_refcount;
+#ifdef CONFIG_64BIT
 			unsigned int compound_nr; /* 1 << compound_order */
+#endif
 		};
 		struct {	/* Second tail page of compound page */
 			unsigned long _compound_pad_1;	/* compound_head */
-			atomic_t hpage_pinned_refcount;
+			unsigned long _compound_pad_2;
 			/* For both global and memcg */
 			struct list_head deferred_list;
 		};

Now it's always available for every compound page.


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 01/17] mm: Add folio_put_refs()
  2022-01-02 21:57 ` [PATCH 01/17] mm: Add folio_put_refs() Matthew Wilcox (Oracle)
  2022-01-04  8:00   ` Christoph Hellwig
@ 2022-01-04 21:15   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-04 21:15 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> This is like folio_put(), but puts N references at once instead of
> just one.  It's like put_page_refs(), but does one atomic operation
> instead of two, and is available to more than just gup.c.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm.h | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index d8b7d7ed14dd..98a10412d581 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1237,6 +1237,26 @@ static inline void folio_put(struct folio *folio)
>   		__put_page(&folio->page);
>   }
>   
> +/**
> + * folio_put_refs - Reduce the reference count on a folio.
> + * @folio: The folio.
> + * @refs: The number of references to reduce.
> + *
> + * If the folio's reference count reaches zero, the memory will be
> + * released back to the page allocator and may be used by another
> + * allocation immediately.  Do not access the memory or the struct folio
> + * after calling folio_put_refs() unless you can be sure that these weren't
> + * the last references.
> + *
> + * Context: May be called in process or interrupt context, but not in NMI
> + * context.  May be called while holding a spinlock.

The context documentation is a nice touch.

Come to think of it, there probably aren't many mm functions that *can* be called
in NMI context. :)

> + */
> +static inline void folio_put_refs(struct folio *folio, int refs)
> +{
> +	if (folio_ref_sub_and_test(folio, refs))
> +		__put_page(&folio->page);
> +}
> +

Looks good, and definitely better than the previous put_page_refs() approach.

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA


>   static inline void put_page(struct page *page)
>   {
>   	struct folio *folio = page_folio(page);



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 02/17] mm: Add folio_pincount_available()
  2022-01-02 21:57 ` [PATCH 02/17] mm: Add folio_pincount_available() Matthew Wilcox (Oracle)
  2022-01-04  8:01   ` Christoph Hellwig
@ 2022-01-04 21:40   ` John Hubbard
  2022-01-05  5:04     ` Matthew Wilcox
  1 sibling, 1 reply; 72+ messages in thread
From: John Hubbard @ 2022-01-04 21:40 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> Convert hpage_pincount_available() into folio_pincount_available() and
> turn hpage_pincount_available() into a wrapper.  We don't need to
> check folio_test_large() before checking folio_order() as
> folio_order() includes a check of folio_test_large().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm.h | 12 ++++++++----
>   1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 98a10412d581..269b5484d66e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -927,15 +927,19 @@ static inline void destroy_compound_page(struct page *page)
>   	compound_page_dtors[page[1].compound_dtor](page);
>   }
>   
> -static inline bool hpage_pincount_available(struct page *page)
> +static inline bool folio_pincount_available(struct folio *folio)
>   {
>   	/*
> -	 * Can the page->hpage_pinned_refcount field be used? That field is in
> +	 * Can the folio->hpage_pinned_refcount field be used? That field is in
>   	 * the 3rd page of the compound page, so the smallest (2-page) compound
>   	 * pages cannot support it.
>   	 */
> -	page = compound_head(page);
> -	return PageCompound(page) && compound_order(page) > 1;
> +	return folio_order(folio) > 1;

I see, no need to look at the compound page head, because folio_order() returns
zero for tail pages, and neatly avoids all of that.


> +}
> +
> +static inline bool hpage_pincount_available(struct page *page)
> +{
> +	return folio_pincount_available(page_folio(page));
>   }
>   
>   static inline int head_compound_pincount(struct page *head)


Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 03/17] mm: Add folio_pincount_ptr()
  2022-01-02 21:57 ` [PATCH 03/17] mm: Add folio_pincount_ptr() Matthew Wilcox (Oracle)
  2022-01-04  8:02   ` Christoph Hellwig
@ 2022-01-04 21:43   ` John Hubbard
  2022-01-06 21:57   ` William Kucharski
  2 siblings, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-04 21:43 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> This is the folio equivalent of compound_pincount_ptr().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm_types.h | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index c3a6e6209600..09d9e2c4a2c5 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -309,6 +309,12 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page)
>   	return &page[1].compound_mapcount;
>   }
>   
> +static inline atomic_t *folio_pincount_ptr(struct folio *folio)
> +{
> +	struct page *tail = &folio->page + 2;
> +	return &tail->hpage_pinned_refcount;
> +}
> +
>   static inline atomic_t *compound_pincount_ptr(struct page *page)
>   {
>   	return &page[2].hpage_pinned_refcount;


Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 04/17] mm: Convert page_maybe_dma_pinned() to use a folio
  2022-01-02 21:57 ` [PATCH 04/17] mm: Convert page_maybe_dma_pinned() to use a folio Matthew Wilcox (Oracle)
  2022-01-04  8:03   ` Christoph Hellwig
@ 2022-01-04 22:01   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-04 22:01 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> Replaces three calls to compound_head() with one.  This removes the last
> user of compound_pincount(), so remove that helper too.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm.h | 17 ++++++-----------
>   1 file changed, 6 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 269b5484d66e..00dcea53bb96 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -947,13 +947,6 @@ static inline int head_compound_pincount(struct page *head)
>   	return atomic_read(compound_pincount_ptr(head));
>   }
>   
> -static inline int compound_pincount(struct page *page)
> -{
> -	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
> -	page = compound_head(page);
> -	return head_compound_pincount(page);
> -}
> -

Yes, the only search hit remaining now is in a printk() string in mm/debug.c,
and that is still reasonable wording, even in the new world order, so I
think we're good:

mm/debug.c:96:  pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n",


>   static inline void set_compound_order(struct page *page, unsigned int order)
>   {
>   	page[1].compound_order = order;
> @@ -1347,18 +1340,20 @@ void unpin_user_pages(struct page **pages, unsigned long npages);
>    */
>   static inline bool page_maybe_dma_pinned(struct page *page)
>   {
> -	if (hpage_pincount_available(page))
> -		return compound_pincount(page) > 0;
> +	struct folio *folio = page_folio(page);
> +
> +	if (folio_pincount_available(folio))
> +		return atomic_read(folio_pincount_ptr(folio)) > 0;
>   
>   	/*
>   	 * page_ref_count() is signed. If that refcount overflows, then
>   	 * page_ref_count() returns a negative value, and callers will avoid
>   	 * further incrementing the refcount.
>   	 *
> -	 * Here, for that overflow case, use the signed bit to count a little
> +	 * Here, for that overflow case, use the sign bit to count a little
>   	 * bit higher via unsigned math, and thus still get an accurate result.
>   	 */
> -	return ((unsigned int)page_ref_count(compound_head(page))) >=
> +	return ((unsigned int)folio_ref_count(folio)) >=
>   		GUP_PIN_COUNTING_BIAS;
>   }
>   

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 05/17] gup: Add try_get_folio()
  2022-01-02 21:57 ` [PATCH 05/17] gup: Add try_get_folio() Matthew Wilcox (Oracle)
  2022-01-04  8:18   ` Christoph Hellwig
@ 2022-01-05  1:25   ` John Hubbard
  2022-01-05  7:00     ` John Hubbard
                       ` (2 more replies)
  1 sibling, 3 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-05  1:25 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> This replaces try_get_compound_head().  It includes a small optimisation
> for the race where a folio is split between being looked up from its
> tail page and the reference count being obtained.  Before, it returned
> NULL, which presumably triggered a retry under the mmap_lock, whereas
> now it will retry without the lock.  Finding a frozen page will still
> return NULL.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 69 +++++++++++++++++++++++++++++---------------------------
>   1 file changed, 36 insertions(+), 33 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 2c51e9748a6a..58e5cfaaa676 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -29,12 +29,11 @@ struct follow_page_context {
>   	unsigned int page_mask;
>   };
>   
> -static void hpage_pincount_add(struct page *page, int refs)
> +static void folio_pincount_add(struct folio *folio, int refs)
>   {
> -	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
> -	VM_BUG_ON_PAGE(page != compound_head(page), page);
> +	VM_BUG_ON_FOLIO(!folio_pincount_available(folio), folio);
>   
> -	atomic_add(refs, compound_pincount_ptr(page));
> +	atomic_add(refs, folio_pincount_ptr(folio));
>   }
>   
>   static void hpage_pincount_sub(struct page *page, int refs)
> @@ -63,33 +62,35 @@ static void put_page_refs(struct page *page, int refs)
>   }
>   
>   /*
> - * Return the compound head page with ref appropriately incremented,
> + * Return the folio with ref appropriately incremented,
>    * or NULL if that failed.
>    */
> -static inline struct page *try_get_compound_head(struct page *page, int refs)
> +static inline struct folio *try_get_folio(struct page *page, int refs)
>   {
> -	struct page *head = compound_head(page);
> +	struct folio *folio;
>   
> -	if (WARN_ON_ONCE(page_ref_count(head) < 0))
> +retry:

Yes, this new retry looks like a solid improvement. Retrying at this low
level makes a lot of sense, given that it is racing with a very
transient sort of behavior.


> +	folio = page_folio(page);
> +	if (WARN_ON_ONCE(folio_ref_count(folio) < 0))
>   		return NULL;
> -	if (unlikely(!page_cache_add_speculative(head, refs)))
> +	if (unlikely(!folio_ref_try_add_rcu(folio, refs)))

I'm a little lost about the meaning and intended use of the _rcu aspects
of folio_ref_try_add_rcu() here. For example, try_get_folio() does not
require that callers are in an rcu read section, right? This is probably
just a documentation question, sorry if it's obvious--I wasn't able to
work it out on my own.

>   		return NULL;
>   
>   	/*
> -	 * At this point we have a stable reference to the head page; but it
> -	 * could be that between the compound_head() lookup and the refcount
> -	 * increment, the compound page was split, in which case we'd end up
> -	 * holding a reference on a page that has nothing to do with the page
> +	 * At this point we have a stable reference to the folio; but it
> +	 * could be that between calling page_folio() and the refcount
> +	 * increment, the folio was split, in which case we'd end up
> +	 * holding a reference on a folio that has nothing to do with the page
>   	 * we were given anymore.
> -	 * So now that the head page is stable, recheck that the pages still
> -	 * belong together.
> +	 * So now that the folio is stable, recheck that the page still
> +	 * belongs to this folio.
>   	 */
> -	if (unlikely(compound_head(page) != head)) {
> -		put_page_refs(head, refs);
> -		return NULL;
> +	if (unlikely(page_folio(page) != folio)) {
> +		folio_put_refs(folio, refs);
> +		goto retry;
>   	}
>   
> -	return head;
> +	return folio;
>   }
>   
>   /**
> @@ -128,8 +129,10 @@ struct page *try_grab_compound_head(struct page *page,
>   				    int refs, unsigned int flags)
>   {
>   	if (flags & FOLL_GET)
> -		return try_get_compound_head(page, refs);
> +		return &try_get_folio(page, refs)->page;


Did you want to use folio_page() here, instead?


>   	else if (flags & FOLL_PIN) {
> +		struct folio *folio;
> +
>   		/*
>   		 * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
>   		 * right zone, so fail and let the caller fall back to the slow
> @@ -143,29 +146,29 @@ struct page *try_grab_compound_head(struct page *page,
>   		 * CAUTION: Don't use compound_head() on the page before this
>   		 * point, the result won't be stable.
>   		 */
> -		page = try_get_compound_head(page, refs);
> -		if (!page)
> +		folio = try_get_folio(page, refs);
> +		if (!folio)
>   			return NULL;
>   
>   		/*
> -		 * When pinning a compound page of order > 1 (which is what
> +		 * When pinning a folio of order > 1 (which is what
>   		 * hpage_pincount_available() checks for), use an exact count to
> -		 * track it, via hpage_pincount_add/_sub().
> +		 * track it, via folio_pincount_add/_sub().
>   		 *
> -		 * However, be sure to *also* increment the normal page refcount
> -		 * field at least once, so that the page really is pinned.
> +		 * However, be sure to *also* increment the normal folio refcount
> +		 * field at least once, so that the folio really is pinned.
>   		 * That's why the refcount from the earlier
> -		 * try_get_compound_head() is left intact.
> +		 * try_get_folio() is left intact.
>   		 */
> -		if (hpage_pincount_available(page))
> -			hpage_pincount_add(page, refs);
> +		if (folio_pincount_available(folio))
> +			folio_pincount_add(folio, refs);
>   		else
> -			page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> +			folio_ref_add(folio,
> +					refs * (GUP_PIN_COUNTING_BIAS - 1));
>   
> -		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
> -				    refs);
> +		node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
>   
> -		return page;
> +		return &folio->page;
>   	}
>   
>   	WARN_ON_ONCE(1);

Just minor questions above. This all looks solid though.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 06/17] mm: Remove page_cache_add_speculative() and page_cache_get_speculative()
  2022-01-02 21:57 ` [PATCH 06/17] mm: Remove page_cache_add_speculative() and page_cache_get_speculative() Matthew Wilcox (Oracle)
  2022-01-04  8:18   ` Christoph Hellwig
@ 2022-01-05  1:29   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-05  1:29 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> These wrappers have no more callers, so delete them.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm.h      |  7 +++----
>   include/linux/pagemap.h | 11 -----------
>   2 files changed, 3 insertions(+), 15 deletions(-)


Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA

> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 00dcea53bb96..602de23482ef 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1298,10 +1298,9 @@ static inline void put_page(struct page *page)
>    * applications that don't have huge page reference counts, this won't be an
>    * issue.
>    *
> - * Locking: the lockless algorithm described in page_cache_get_speculative()
> - * and page_cache_gup_pin_speculative() provides safe operation for
> - * get_user_pages and page_mkclean and other calls that race to set up page
> - * table entries.
> + * Locking: the lockless algorithm described in folio_try_get_rcu()
> + * provides safe operation for get_user_pages(), page_mkclean() and
> + * other calls that race to set up page table entries.
>    */
>   #define GUP_PIN_COUNTING_BIAS (1U << 10)
>   
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 704cb1b4b15d..4a63176b6417 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -283,17 +283,6 @@ static inline struct inode *folio_inode(struct folio *folio)
>   	return folio->mapping->host;
>   }
>   
> -static inline bool page_cache_add_speculative(struct page *page, int count)
> -{
> -	VM_BUG_ON_PAGE(PageTail(page), page);
> -	return folio_ref_try_add_rcu((struct folio *)page, count);
> -}
> -
> -static inline bool page_cache_get_speculative(struct page *page)
> -{
> -	return page_cache_add_speculative(page, 1);
> -}
> -
>   /**
>    * folio_attach_private - Attach private data to a folio.
>    * @folio: Folio to attach data to.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 02/17] mm: Add folio_pincount_available()
  2022-01-04 21:40   ` John Hubbard
@ 2022-01-05  5:04     ` Matthew Wilcox
  2022-01-05  6:24       ` John Hubbard
  0 siblings, 1 reply; 72+ messages in thread
From: Matthew Wilcox @ 2022-01-05  5:04 UTC (permalink / raw)
  To: John Hubbard; +Cc: linux-mm, Andrew Morton

On Tue, Jan 04, 2022 at 01:40:36PM -0800, John Hubbard wrote:
> > -	page = compound_head(page);
> > -	return PageCompound(page) && compound_order(page) > 1;
> > +	return folio_order(folio) > 1;
> 
> I see, no need to look at the compound page head, because folio_order() returns
> zero for tail pages, and neatly avoids all of that.

Did you mean base pages instead of tail pages?  A folio can never be a
tail page.



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 02/17] mm: Add folio_pincount_available()
  2022-01-05  5:04     ` Matthew Wilcox
@ 2022-01-05  6:24       ` John Hubbard
  0 siblings, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-05  6:24 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, Andrew Morton

On 1/4/22 21:04, Matthew Wilcox wrote:
> On Tue, Jan 04, 2022 at 01:40:36PM -0800, John Hubbard wrote:
>>> -	page = compound_head(page);
>>> -	return PageCompound(page) && compound_order(page) > 1;
>>> +	return folio_order(folio) > 1;
>>
>> I see, no need to look at the compound page head, because folio_order() returns
>> zero for tail pages, and neatly avoids all of that.
> 
> Did you mean base pages instead of tail pages?  A folio can never be a
> tail page.
> 

Oh right. I was thinking (incorrectly) that there might be some cases during
the page-to-folio conversion in which a tail page could sort of slip through,
but good point.


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 07/17] gup: Add gup_put_folio()
  2022-01-02 21:57 ` [PATCH 07/17] gup: Add gup_put_folio() Matthew Wilcox (Oracle)
  2022-01-04  8:22   ` Christoph Hellwig
@ 2022-01-05  6:52   ` John Hubbard
  2022-01-06 22:05   ` William Kucharski
  2 siblings, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-05  6:52 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> put_compound_head() is turned into a call to gup_put_folio().
> This removes the last call to put_page_refs(), so delete it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 44 +++++++++++++++-----------------------------
>   1 file changed, 15 insertions(+), 29 deletions(-)
> 

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA

> diff --git a/mm/gup.c b/mm/gup.c
> index 58e5cfaaa676..6d827f7d66d8 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -36,29 +36,11 @@ static void folio_pincount_add(struct folio *folio, int refs)
>   	atomic_add(refs, folio_pincount_ptr(folio));
>   }
>   
> -static void hpage_pincount_sub(struct page *page, int refs)
> +static void folio_pincount_sub(struct folio *folio, int refs)
>   {
> -	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
> -	VM_BUG_ON_PAGE(page != compound_head(page), page);
> -
> -	atomic_sub(refs, compound_pincount_ptr(page));
> -}
> -
> -/* Equivalent to calling put_page() @refs times. */
> -static void put_page_refs(struct page *page, int refs)
> -{
> -#ifdef CONFIG_DEBUG_VM
> -	if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
> -		return;
> -#endif
> +	VM_BUG_ON_FOLIO(!folio_pincount_available(folio), folio);
>   
> -	/*
> -	 * Calling put_page() for each ref is unnecessarily slow. Only the last
> -	 * ref needs a put_page().
> -	 */
> -	if (refs > 1)
> -		page_ref_sub(page, refs - 1);
> -	put_page(page);
> +	atomic_sub(refs, folio_pincount_ptr(folio));
>   }
>   
>   /*
> @@ -175,19 +157,23 @@ struct page *try_grab_compound_head(struct page *page,
>   	return NULL;
>   }
>   
> -static void put_compound_head(struct page *page, int refs, unsigned int flags)
> +static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
>   {
>   	if (flags & FOLL_PIN) {
> -		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED,
> -				    refs);
> -
> -		if (hpage_pincount_available(page))
> -			hpage_pincount_sub(page, refs);
> +		node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
> +		if (folio_pincount_available(folio))
> +			folio_pincount_sub(folio, refs);
>   		else
>   			refs *= GUP_PIN_COUNTING_BIAS;
>   	}
>   
> -	put_page_refs(page, refs);
> +	folio_put_refs(folio, refs);
> +}
> +
> +static void put_compound_head(struct page *page, int refs, unsigned int flags)
> +{
> +	VM_BUG_ON_PAGE(PageTail(page), page);
> +	gup_put_folio((struct folio *)page, refs, flags);
>   }
>   
>   /**
> @@ -228,7 +214,7 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags)
>    */
>   void unpin_user_page(struct page *page)
>   {
> -	put_compound_head(compound_head(page), 1, FOLL_PIN);
> +	gup_put_folio(page_folio(page), 1, FOLL_PIN);
>   }
>   EXPORT_SYMBOL(unpin_user_page);
>   


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 05/17] gup: Add try_get_folio()
  2022-01-05  1:25   ` John Hubbard
@ 2022-01-05  7:00     ` John Hubbard
  2022-01-07 18:23     ` Jason Gunthorpe
  2022-01-08  1:37     ` Matthew Wilcox
  2 siblings, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-05  7:00 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/4/22 17:25, John Hubbard wrote:
>> @@ -128,8 +129,10 @@ struct page *try_grab_compound_head(struct page *page,
>>                       int refs, unsigned int flags)
>>   {
>>       if (flags & FOLL_GET)
>> -        return try_get_compound_head(page, refs);
>> +        return &try_get_folio(page, refs)->page;
> 
> 
> Did you want to use folio_page() here, instead?
> 
...never mind about that, because I see that patch 08 gets rid of the
"->page", anyway.


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 08/17] gup: Add try_grab_folio()
  2022-01-02 21:57 ` [PATCH 08/17] gup: Add try_grab_folio() Matthew Wilcox (Oracle)
  2022-01-04  8:24   ` Christoph Hellwig
@ 2022-01-05  7:06   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-05  7:06 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> try_grab_compound_head() is turned into a call to try_grab_folio().
> Convert the two callers who only care about a boolean success/fail.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm.h |  4 +---
>   mm/gup.c           | 25 +++++++++++++------------
>   mm/hugetlb.c       |  7 +++----
>   3 files changed, 17 insertions(+), 19 deletions(-)
> 

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA

> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 602de23482ef..4e763a590c9c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1202,9 +1202,7 @@ static inline void get_page(struct page *page)
>   }
>   
>   bool __must_check try_grab_page(struct page *page, unsigned int flags);
> -struct page *try_grab_compound_head(struct page *page, int refs,
> -				    unsigned int flags);
> -
> +struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags);
>   
>   static inline __must_check bool try_get_page(struct page *page)
>   {
> diff --git a/mm/gup.c b/mm/gup.c
> index 6d827f7d66d8..2307b2917055 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -76,12 +76,8 @@ static inline struct folio *try_get_folio(struct page *page, int refs)
>   }
>   
>   /**
> - * try_grab_compound_head() - attempt to elevate a page's refcount, by a
> + * try_grab_folio() - attempt to elevate a page's refcount, by a
>    * flags-dependent amount.
> - *
> - * Even though the name includes "compound_head", this function is still
> - * appropriate for callers that have a non-compound @page to get.
> - *
>    * @page:  pointer to page to be grabbed
>    * @refs:  the value to (effectively) add to the page's refcount
>    * @flags: gup flags: these are the FOLL_* flag values.
> @@ -102,16 +98,15 @@ static inline struct folio *try_get_folio(struct page *page, int refs)
>    *    FOLL_PIN on normal pages, or compound pages that are two pages long:
>    *    page's refcount will be incremented by @refs * GUP_PIN_COUNTING_BIAS.
>    *
> - * Return: head page (with refcount appropriately incremented) for success, or
> + * Return: folio (with refcount appropriately incremented) for success, or
>    * NULL upon failure. If neither FOLL_GET nor FOLL_PIN was set, that's
>    * considered failure, and furthermore, a likely bug in the caller, so a warning
>    * is also emitted.
>    */
> -struct page *try_grab_compound_head(struct page *page,
> -				    int refs, unsigned int flags)
> +struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
>   {
>   	if (flags & FOLL_GET)
> -		return &try_get_folio(page, refs)->page;
> +		return try_get_folio(page, refs);
>   	else if (flags & FOLL_PIN) {
>   		struct folio *folio;
>   
> @@ -150,13 +145,19 @@ struct page *try_grab_compound_head(struct page *page,
>   
>   		node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
>   
> -		return &folio->page;
> +		return folio;
>   	}
>   
>   	WARN_ON_ONCE(1);
>   	return NULL;
>   }
>   
> +static inline struct page *try_grab_compound_head(struct page *page,
> +		int refs, unsigned int flags)
> +{
> +	return &try_grab_folio(page, refs, flags)->page;
> +}
> +
>   static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
>   {
>   	if (flags & FOLL_PIN) {
> @@ -188,7 +189,7 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
>    * @flags:   gup flags: these are the FOLL_* flag values.
>    *
>    * Either FOLL_PIN or FOLL_GET (or neither) may be set, but not both at the same
> - * time. Cases: please see the try_grab_compound_head() documentation, with
> + * time. Cases: please see the try_grab_folio() documentation, with
>    * "refs=1".
>    *
>    * Return: true for success, or if no action was required (if neither FOLL_PIN
> @@ -200,7 +201,7 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags)
>   	if (!(flags & (FOLL_GET | FOLL_PIN)))
>   		return true;
>   
> -	return try_grab_compound_head(page, 1, flags);
> +	return try_grab_folio(page, 1, flags);
>   }
>   
>   /**
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index abcd1785c629..ab67b13c4a71 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6072,7 +6072,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>   
>   		if (pages) {
>   			/*
> -			 * try_grab_compound_head() should always succeed here,
> +			 * try_grab_folio() should always succeed here,
>   			 * because: a) we hold the ptl lock, and b) we've just
>   			 * checked that the huge page is present in the page
>   			 * tables. If the huge page is present, then the tail
> @@ -6081,9 +6081,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>   			 * any way. So this page must be available at this
>   			 * point, unless the page refcount overflowed:
>   			 */
> -			if (WARN_ON_ONCE(!try_grab_compound_head(pages[i],
> -								 refs,
> -								 flags))) {
> +			if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs,
> +							 flags))) {
>   				spin_unlock(ptl);
>   				remainder = 0;
>   				err = -ENOMEM;



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 09/17] gup: Convert gup_pte_range() to use a folio
  2022-01-02 21:57 ` [PATCH 09/17] gup: Convert gup_pte_range() to use a folio Matthew Wilcox (Oracle)
  2022-01-04  8:25   ` Christoph Hellwig
@ 2022-01-05  7:36   ` John Hubbard
  2022-01-05  7:52     ` Matthew Wilcox
  1 sibling, 1 reply; 72+ messages in thread
From: John Hubbard @ 2022-01-05  7:36 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> We still call try_grab_folio() once per PTE; a future patch could
> optimise to just adjust the reference count for each page within
> the folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 16 +++++++---------
>   1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 2307b2917055..d8535f9d5622 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2260,7 +2260,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>   	ptem = ptep = pte_offset_map(&pmd, addr);
>   	do {
>   		pte_t pte = ptep_get_lockless(ptep);
> -		struct page *head, *page;
> +		struct page *page;
> +		struct folio *folio;
>   
>   		/*
>   		 * Similar to the PMD case below, NUMA hinting must take slow
> @@ -2287,22 +2288,20 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>   		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
>   		page = pte_page(pte);
>   
> -		head = try_grab_compound_head(page, 1, flags);
> -		if (!head)
> +		folio = try_grab_folio(page, 1, flags);
> +		if (!folio)
>   			goto pte_unmap;
>   
>   		if (unlikely(page_is_secretmem(page))) {
> -			put_compound_head(head, 1, flags);
> +			gup_put_folio(folio, 1, flags);
>   			goto pte_unmap;
>   		}
>   
>   		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> -			put_compound_head(head, 1, flags);
> +			gup_put_folio(folio, 1, flags);
>   			goto pte_unmap;
>   		}
>   
> -		VM_BUG_ON_PAGE(compound_head(page) != head, page);
> -
>   		/*
>   		 * We need to make the page accessible if and only if we are
>   		 * going to access its content (the FOLL_PIN case).  Please
> @@ -2316,10 +2315,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>   				goto pte_unmap;
>   			}
>   		}
> -		SetPageReferenced(page);
> +		folio_set_referenced(folio);

For the case of a tail page, I *think* the above hunk changes the
behavior. Previously, the tail page's flags were affected, but now, the
head page's (folio's) page flags are being changed...right?


thanks,
-- 
John Hubbard
NVIDIA

>   		pages[*nr] = page;
>   		(*nr)++;
> -
>   	} while (ptep++, addr += PAGE_SIZE, addr != end);
>   
>   	ret = 1;



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 10/17] gup: Convert gup_hugepte() to use a folio
  2022-01-02 21:57 ` [PATCH 10/17] gup: Convert gup_hugepte() " Matthew Wilcox (Oracle)
  2022-01-04  8:26   ` Christoph Hellwig
@ 2022-01-05  7:46   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-05  7:46 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> There should be little to no effect from this patch; just removing
> uses of some old APIs.
> 
> While I'm looking at this, take the opportunity to use nth_page()
> instead of doing the arithmetic ourselves in case hugetlbfs pages
> are ever allocated across memmap boundaries.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 16 ++++++++--------
>   1 file changed, 8 insertions(+), 8 deletions(-)
> 

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA

> diff --git a/mm/gup.c b/mm/gup.c
> index d8535f9d5622..1c7fb668b46d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2435,7 +2435,7 @@ static int record_subpages(struct page *page, unsigned long addr,
>   	int nr;
>   
>   	for (nr = 0; addr != end; addr += PAGE_SIZE)
> -		pages[nr++] = page++;
> +		pages[nr++] = nth_page(page, nr);
>   
>   	return nr;
>   }
> @@ -2453,7 +2453,8 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
>   		       struct page **pages, int *nr)
>   {
>   	unsigned long pte_end;
> -	struct page *head, *page;
> +	struct page *page;
> +	struct folio *folio;
>   	pte_t pte;
>   	int refs;
>   
> @@ -2469,21 +2470,20 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
>   	/* hugepages are never "special" */
>   	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
>   
> -	head = pte_page(pte);
> -	page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
> +	page = nth_page(pte_page(pte), (addr & (sz - 1)) >> PAGE_SHIFT);
>   	refs = record_subpages(page, addr, end, pages + *nr);
>   
> -	head = try_grab_compound_head(head, refs, flags);
> -	if (!head)
> +	folio = try_grab_folio(page, refs, flags);
> +	if (!folio)
>   		return 0;
>   
>   	if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> -		put_compound_head(head, refs, flags);
> +		gup_put_folio(folio, refs, flags);
>   		return 0;
>   	}
>   
>   	*nr += refs;
> -	SetPageReferenced(head);
> +	folio_set_referenced(folio);
>   	return 1;
>   }
>   



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 11/17] gup: Convert gup_huge_pmd() to use a folio
  2022-01-02 21:57 ` [PATCH 11/17] gup: Convert gup_huge_pmd() " Matthew Wilcox (Oracle)
  2022-01-04  8:26   ` Christoph Hellwig
@ 2022-01-05  7:50   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-05  7:50 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> Use the new folio-based APIs.  Also fix an assumption that memmap is
> contiguous.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA

> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 1c7fb668b46d..be965c965484 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2517,7 +2517,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>   			unsigned long end, unsigned int flags,
>   			struct page **pages, int *nr)
>   {
> -	struct page *head, *page;
> +	struct page *page;
> +	struct folio *folio;
>   	int refs;
>   
>   	if (!pmd_access_permitted(orig, flags & FOLL_WRITE))
> @@ -2530,20 +2531,20 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>   					     pages, nr);
>   	}
>   
> -	page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> +	page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
>   	refs = record_subpages(page, addr, end, pages + *nr);
>   
> -	head = try_grab_compound_head(pmd_page(orig), refs, flags);
> -	if (!head)
> +	folio = try_grab_folio(page, refs, flags);
> +	if (!folio)
>   		return 0;
>   
>   	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
> -		put_compound_head(head, refs, flags);
> +		gup_put_folio(folio, refs, flags);
>   		return 0;
>   	}
>   
>   	*nr += refs;
> -	SetPageReferenced(head);
> +	folio_set_referenced(folio);
>   	return 1;
>   }
>   


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 09/17] gup: Convert gup_pte_range() to use a folio
  2022-01-05  7:36   ` John Hubbard
@ 2022-01-05  7:52     ` Matthew Wilcox
  2022-01-05  7:57       ` John Hubbard
  0 siblings, 1 reply; 72+ messages in thread
From: Matthew Wilcox @ 2022-01-05  7:52 UTC (permalink / raw)
  To: John Hubbard; +Cc: linux-mm, Andrew Morton

On Tue, Jan 04, 2022 at 11:36:29PM -0800, John Hubbard wrote:
> > -		SetPageReferenced(page);
> > +		folio_set_referenced(folio);
> 
> For the case of a tail page, I *think* the above hunk changes the
> behavior. Previously, the tail page's flags were affected, but now, the
> head page's (folio's) page flags are being changed...right?

So ... most flags don't really "exist" for tail pages.

PAGEFLAG(Referenced, referenced, PF_HEAD)

#define PF_HEAD(page, enforce)  PF_POISONED_CHECK(compound_head(page))

so any time you call SetPageReferenced(page), we have a hidden call to
compound_head() in order to set PG_referenced on the head page.

It's only the PF_ANY flags which are (theoretically) settable on tail
pages:

PAGEFLAG(Private, private, PF_ANY)
PAGEFLAG(Private2, private_2, PF_ANY)
PAGEFLAG(OwnerPriv1, owner_priv_1, PF_ANY)
PAGEFLAG(HWPoison, hwpoison, PF_ANY)
TESTPAGEFLAG(Young, young, PF_ANY)
PAGEFLAG(Idle, idle, PF_ANY)
__PAGEFLAG(Head, head, PF_ANY)
__PAGEFLAG(Isolated, isolated, PF_ANY)

I honestly doubt many of these are actually settable on tail pages;
it's simply that nobody's really thought about them for compound pages.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 09/17] gup: Convert gup_pte_range() to use a folio
  2022-01-05  7:52     ` Matthew Wilcox
@ 2022-01-05  7:57       ` John Hubbard
  0 siblings, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-05  7:57 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, Andrew Morton

On 1/4/22 23:52, Matthew Wilcox wrote:
> On Tue, Jan 04, 2022 at 11:36:29PM -0800, John Hubbard wrote:
>>> -		SetPageReferenced(page);
>>> +		folio_set_referenced(folio);
>>
>> For the case of a tail page, I *think* the above hunk changes the
>> behavior. Previously, the tail page's flags were affected, but now, the
>> head page's (folio's) page flags are being changed...right?
> 
> So ... most flags don't really "exist" for tail pages.
> 
> PAGEFLAG(Referenced, referenced, PF_HEAD)
> 
> #define PF_HEAD(page, enforce)  PF_POISONED_CHECK(compound_head(page))
> 
> so any time you call SetPageReferenced(page), we have a hidden call to
> compound_head() in order to set PG_referenced on the head page.
> 
> It's only the PF_ANY flags which are (theoretically) settable on tail
> pages:
> 
> PAGEFLAG(Private, private, PF_ANY)
> PAGEFLAG(Private2, private_2, PF_ANY)
> PAGEFLAG(OwnerPriv1, owner_priv_1, PF_ANY)
> PAGEFLAG(HWPoison, hwpoison, PF_ANY)
> TESTPAGEFLAG(Young, young, PF_ANY)
> PAGEFLAG(Idle, idle, PF_ANY)
> __PAGEFLAG(Head, head, PF_ANY)
> __PAGEFLAG(Isolated, isolated, PF_ANY)
> 
> I honestly doubt many of these are actually settable on tail pages;
> it's simply that nobody's really thought about them for compound pages.

OK, I see. Thanks for spelling that out for me. :)

Please feel free to add:

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 12/17] gup: Convert gup_huge_pud() to use a folio
  2022-01-02 21:57 ` [PATCH 12/17] gup: Convert gup_huge_pud() " Matthew Wilcox (Oracle)
@ 2022-01-05  7:58   ` John Hubbard
  0 siblings, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-05  7:58 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> Use the new folio-based APIs.  Also fix an assumption that memmap is
> contiguous.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)
> 

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA

> diff --git a/mm/gup.c b/mm/gup.c
> index be965c965484..e7bcee8776e1 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2552,7 +2552,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
>   			unsigned long end, unsigned int flags,
>   			struct page **pages, int *nr)
>   {
> -	struct page *head, *page;
> +	struct page *page;
> +	struct folio *folio;
>   	int refs;
>   
>   	if (!pud_access_permitted(orig, flags & FOLL_WRITE))
> @@ -2565,20 +2566,20 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
>   					     pages, nr);
>   	}
>   
> -	page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> +	page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
>   	refs = record_subpages(page, addr, end, pages + *nr);
>   
> -	head = try_grab_compound_head(pud_page(orig), refs, flags);
> -	if (!head)
> +	folio = try_grab_folio(page, refs, flags);
> +	if (!folio)
>   		return 0;
>   
>   	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
> -		put_compound_head(head, refs, flags);
> +		gup_put_folio(folio, refs, flags);
>   		return 0;
>   	}
>   
>   	*nr += refs;
> -	SetPageReferenced(head);
> +	folio_set_referenced(folio);
>   	return 1;
>   }
>   


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 13/17] gup: Convert gup_huge_pgd() to use a folio
  2022-01-02 21:57 ` [PATCH 13/17] gup: Convert gup_huge_pgd() " Matthew Wilcox (Oracle)
@ 2022-01-05  7:58   ` John Hubbard
  0 siblings, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-05  7:58 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> Use the new folio-based APIs.  Also fix an assumption that memmap is
> contiguous.  This was the last user of try_grab_compound_head(),
> so remove it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 19 +++++++------------
>   1 file changed, 7 insertions(+), 12 deletions(-)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA

> 
> diff --git a/mm/gup.c b/mm/gup.c
> index e7bcee8776e1..7bd1e4a2648a 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -152,12 +152,6 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
>   	return NULL;
>   }
>   
> -static inline struct page *try_grab_compound_head(struct page *page,
> -		int refs, unsigned int flags)
> -{
> -	return &try_grab_folio(page, refs, flags)->page;
> -}
> -
>   static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
>   {
>   	if (flags & FOLL_PIN) {
> @@ -2588,27 +2582,28 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
>   			struct page **pages, int *nr)
>   {
>   	int refs;
> -	struct page *head, *page;
> +	struct page *page;
> +	struct folio *folio;
>   
>   	if (!pgd_access_permitted(orig, flags & FOLL_WRITE))
>   		return 0;
>   
>   	BUILD_BUG_ON(pgd_devmap(orig));
>   
> -	page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
> +	page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
>   	refs = record_subpages(page, addr, end, pages + *nr);
>   
> -	head = try_grab_compound_head(pgd_page(orig), refs, flags);
> -	if (!head)
> +	folio = try_grab_folio(page, refs, flags);
> +	if (!folio)
>   		return 0;
>   
>   	if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) {
> -		put_compound_head(head, refs, flags);
> +		gup_put_folio(folio, refs, flags);
>   		return 0;
>   	}
>   
>   	*nr += refs;
> -	SetPageReferenced(head);
> +	folio_set_referenced(folio);
>   	return 1;
>   }
>   


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio()
  2022-01-02 21:57 ` [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio() Matthew Wilcox (Oracle)
  2022-01-04  8:32   ` Christoph Hellwig
@ 2022-01-05  8:17   ` John Hubbard
  2022-01-09  4:39     ` Matthew Wilcox
  1 sibling, 1 reply; 72+ messages in thread
From: John Hubbard @ 2022-01-05  8:17 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> This macro can be considerably simplified by returning the folio from
> gup_folio_next() instead of void from compound_next().  Convert both
> callers to work on folios instead of pages.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 47 ++++++++++++++++++++++++-----------------------
>   1 file changed, 24 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 7bd1e4a2648a..eaffa6807609 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -239,31 +239,29 @@ static inline void compound_range_next(unsigned long i, unsigned long npages,
>   	     __i < __npages; __i += __ntails, \
>   	     compound_range_next(__i, __npages, __list, &(__head), &(__ntails)))
>   
> -static inline void compound_next(unsigned long i, unsigned long npages,
> -				 struct page **list, struct page **head,
> -				 unsigned int *ntails)
> +static inline struct folio *gup_folio_next(unsigned long i,
> +		unsigned long npages, struct page **list, unsigned int *ntails)
>   {
> -	struct page *page;
> +	struct folio *folio;
>   	unsigned int nr;
>   
>   	if (i >= npages)
> -		return;
> +		return NULL;
>   
> -	page = compound_head(list[i]);
> +	folio = page_folio(list[i]);
>   	for (nr = i + 1; nr < npages; nr++) {
> -		if (compound_head(list[nr]) != page)
> +		if (page_folio(list[nr]) != folio)
>   			break;
>   	}
>   
> -	*head = page;
>   	*ntails = nr - i;
> +	return folio;
>   }
>   
> -#define for_each_compound_head(__i, __list, __npages, __head, __ntails) \
> -	for (__i = 0, \
> -	     compound_next(__i, __npages, __list, &(__head), &(__ntails)); \
> -	     __i < __npages; __i += __ntails, \
> -	     compound_next(__i, __npages, __list, &(__head), &(__ntails)))
> +#define gup_for_each_folio(__i, __list, __npages, __folio, __ntails) \
> +	for (__i = 0; \
> +	     (__folio = gup_folio_next(__i, __npages, __list, &(__ntails))) != NULL; \
> +	     __i += __ntails)


This is nice. I find these pre-existing macros to be really quite
horrible, but I was unable to suggest anything better at the time, so
it's good to see the simplification. :)

>   
>   /**
>    * unpin_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
> @@ -291,15 +289,15 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
>   				 bool make_dirty)
>   {
>   	unsigned long index;
> -	struct page *head;
> -	unsigned int ntails;
> +	struct folio *folio;
> +	unsigned int nr;
>   
>   	if (!make_dirty) {
>   		unpin_user_pages(pages, npages);
>   		return;
>   	}
>   
> -	for_each_compound_head(index, pages, npages, head, ntails) {
> +	gup_for_each_folio(index, pages, npages, folio, nr) {
>   		/*
>   		 * Checking PageDirty at this point may race with
>   		 * clear_page_dirty_for_io(), but that's OK. Two key
> @@ -320,9 +318,12 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
>   		 * written back, so it gets written back again in the
>   		 * next writeback cycle. This is harmless.
>   		 */
> -		if (!PageDirty(head))
> -			set_page_dirty_lock(head);
> -		put_compound_head(head, ntails, FOLL_PIN);
> +		if (!folio_test_dirty(folio)) {
> +			folio_lock(folio);
> +			folio_mark_dirty(folio);
> +			folio_unlock(folio);

At some point, maybe even here, I suspect that creating the folio
version of set_page_dirty_lock() would help. I'm sure you have
a better feel for whether it helps, after doing all of this conversion
work, but it just sort of jumped out at me as surprising to see it
in this form.

In any case, this all looks correct, so


Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

> +		}
> +		gup_put_folio(folio, nr, FOLL_PIN);
>   	}
>   }
>   EXPORT_SYMBOL(unpin_user_pages_dirty_lock);
> @@ -375,8 +376,8 @@ EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
>   void unpin_user_pages(struct page **pages, unsigned long npages)
>   {
>   	unsigned long index;
> -	struct page *head;
> -	unsigned int ntails;
> +	struct folio *folio;
> +	unsigned int nr;
>   
>   	/*
>   	 * If this WARN_ON() fires, then the system *might* be leaking pages (by
> @@ -386,8 +387,8 @@ void unpin_user_pages(struct page **pages, unsigned long npages)
>   	if (WARN_ON(IS_ERR_VALUE(npages)))
>   		return;
>   
> -	for_each_compound_head(index, pages, npages, head, ntails)
> -		put_compound_head(head, ntails, FOLL_PIN);
> +	gup_for_each_folio(index, pages, npages, folio, nr)
> +		gup_put_folio(folio, nr, FOLL_PIN);
>   }
>   EXPORT_SYMBOL(unpin_user_pages);
>   



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 15/17] gup: Convert for_each_compound_range() to gup_for_each_folio_range()
  2022-01-02 21:57 ` [PATCH 15/17] gup: Convert for_each_compound_range() to gup_for_each_folio_range() Matthew Wilcox (Oracle)
  2022-01-04  8:35   ` Christoph Hellwig
@ 2022-01-05  8:30   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-05  8:30 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> This macro can be considerably simplified by returning the folio from
> gup_folio_range_next() instead of void from compound_next().  Convert the
> only caller to work on folios instead of pages.
> 
> This removes the last caller of put_compound_head(), so delete it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 50 +++++++++++++++++++++++---------------------------
>   1 file changed, 23 insertions(+), 27 deletions(-)
> 

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA

> diff --git a/mm/gup.c b/mm/gup.c
> index eaffa6807609..76717e05413d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -165,12 +165,6 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
>   	folio_put_refs(folio, refs);
>   }
>   
> -static void put_compound_head(struct page *page, int refs, unsigned int flags)
> -{
> -	VM_BUG_ON_PAGE(PageTail(page), page);
> -	gup_put_folio((struct folio *)page, refs, flags);
> -}
> -
>   /**
>    * try_grab_page() - elevate a page's refcount by a flag-dependent amount
>    *
> @@ -213,31 +207,30 @@ void unpin_user_page(struct page *page)
>   }
>   EXPORT_SYMBOL(unpin_user_page);
>   
> -static inline void compound_range_next(unsigned long i, unsigned long npages,
> -				       struct page **list, struct page **head,
> -				       unsigned int *ntails)
> +static inline struct folio *gup_folio_range_next(unsigned long i,
> +		unsigned long npages, struct page **list, unsigned int *ntails)
>   {
> -	struct page *next, *page;
> +	struct page *next;
> +	struct folio *folio;
>   	unsigned int nr = 1;
>   
>   	if (i >= npages)
> -		return;
> +		return NULL;
>   
>   	next = *list + i;
> -	page = compound_head(next);
> -	if (PageCompound(page) && compound_order(page) >= 1)
> -		nr = min_t(unsigned int,
> -			   page + compound_nr(page) - next, npages - i);
> +	folio = page_folio(next);
> +	if (folio_test_large(folio))
> +		nr = min_t(unsigned int, npages - i,
> +			   &folio->page + folio_nr_pages(folio) - next);
>   
> -	*head = page;
>   	*ntails = nr;
> +	return folio;
>   }
>   
> -#define for_each_compound_range(__i, __list, __npages, __head, __ntails) \
> -	for (__i = 0, \
> -	     compound_range_next(__i, __npages, __list, &(__head), &(__ntails)); \
> -	     __i < __npages; __i += __ntails, \
> -	     compound_range_next(__i, __npages, __list, &(__head), &(__ntails)))
> +#define gup_for_each_folio_range(__i, __list, __npages, __folio, __ntails) \
> +	for (__i = 0; \
> +	     (__folio = gup_folio_range_next(__i, __npages, __list, &(__ntails))) != NULL; \
> +	     __i += __ntails)
>   
>   static inline struct folio *gup_folio_next(unsigned long i,
>   		unsigned long npages, struct page **list, unsigned int *ntails)
> @@ -353,13 +346,16 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
>   				      bool make_dirty)
>   {
>   	unsigned long index;
> -	struct page *head;
> -	unsigned int ntails;
> +	struct folio *folio;
> +	unsigned int nr;
>   
> -	for_each_compound_range(index, &page, npages, head, ntails) {
> -		if (make_dirty && !PageDirty(head))
> -			set_page_dirty_lock(head);
> -		put_compound_head(head, ntails, FOLL_PIN);
> +	gup_for_each_folio_range(index, &page, npages, folio, nr) {
> +		if (make_dirty && !folio_test_dirty(folio)) {
> +			folio_lock(folio);
> +			folio_mark_dirty(folio);
> +			folio_unlock(folio);
> +		}
> +		gup_put_folio(folio, nr, FOLL_PIN);
>   	}
>   }
>   EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 16/17] mm: Add isolate_lru_folio()
  2022-01-02 21:57 ` [PATCH 16/17] mm: Add isolate_lru_folio() Matthew Wilcox (Oracle)
  2022-01-04  8:36   ` Christoph Hellwig
@ 2022-01-05  8:44   ` John Hubbard
  2022-01-06  0:34     ` Matthew Wilcox
  1 sibling, 1 reply; 72+ messages in thread
From: John Hubbard @ 2022-01-05  8:44 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> Turn isolate_lru_page() into a wrapper around isolate_lru_folio().
> TestClearPageLRU() would have always failed on a tail page, so
> returning -EBUSY is the same behaviour.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   arch/powerpc/include/asm/mmu_context.h |  1 -
>   mm/folio-compat.c                      |  8 +++++
>   mm/internal.h                          |  3 +-
>   mm/vmscan.c                            | 43 ++++++++++++--------------
>   4 files changed, 29 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 9ba6b585337f..b9cab0a11421 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -21,7 +21,6 @@ extern void destroy_context(struct mm_struct *mm);
>   #ifdef CONFIG_SPAPR_TCE_IOMMU
>   struct mm_iommu_table_group_mem_t;
>   
> -extern int isolate_lru_page(struct page *page);	/* from internal.h */
>   extern bool mm_iommu_preregistered(struct mm_struct *mm);
>   extern long mm_iommu_new(struct mm_struct *mm,
>   		unsigned long ua, unsigned long entries,
> diff --git a/mm/folio-compat.c b/mm/folio-compat.c
> index 749555a232a8..782e766cd1ee 100644
> --- a/mm/folio-compat.c
> +++ b/mm/folio-compat.c
> @@ -7,6 +7,7 @@
>   #include <linux/migrate.h>
>   #include <linux/pagemap.h>
>   #include <linux/swap.h>
> +#include "internal.h"
>   
>   struct address_space *page_mapping(struct page *page)
>   {
> @@ -151,3 +152,10 @@ int try_to_release_page(struct page *page, gfp_t gfp)
>   	return filemap_release_folio(page_folio(page), gfp);
>   }
>   EXPORT_SYMBOL(try_to_release_page);
> +
> +int isolate_lru_page(struct page *page)
> +{
> +	if (WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"))
> +		return -EBUSY;
> +	return isolate_lru_folio((struct folio *)page);

This cast is not great to have, but it is correct, given the above
constraints about tail pages...and I expect this sort of thing is
temporary, anyway?

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA

> +}
> diff --git a/mm/internal.h b/mm/internal.h
> index e989d8ceec91..977d5116d327 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -178,7 +178,8 @@ extern unsigned long highest_memmap_pfn;
>   /*
>    * in mm/vmscan.c:
>    */
> -extern int isolate_lru_page(struct page *page);
> +int isolate_lru_page(struct page *page);
> +int isolate_lru_folio(struct folio *folio);
>   extern void putback_lru_page(struct page *page);
>   extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason);
>   
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fb9584641ac7..ac2f5b76cdb2 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2168,45 +2168,40 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>   }
>   
>   /**
> - * isolate_lru_page - tries to isolate a page from its LRU list
> - * @page: page to isolate from its LRU list
> + * isolate_lru_folio - Try to isolate a folio from its LRU list.
> + * @folio: Folio to isolate from its LRU list.
>    *
> - * Isolates a @page from an LRU list, clears PageLRU and adjusts the
> - * vmstat statistic corresponding to whatever LRU list the page was on.
> + * Isolate a @folio from an LRU list and adjust the vmstat statistic
> + * corresponding to whatever LRU list the folio was on.
>    *
> - * Returns 0 if the page was removed from an LRU list.
> - * Returns -EBUSY if the page was not on an LRU list.
> - *
> - * The returned page will have PageLRU() cleared.  If it was found on
> - * the active list, it will have PageActive set.  If it was found on
> - * the unevictable list, it will have the PageUnevictable bit set. That flag
> + * The folio will have its LRU flag cleared.  If it was found on the
> + * active list, it will have the Active flag set.  If it was found on the
> + * unevictable list, it will have the Unevictable flag set.  These flags
>    * may need to be cleared by the caller before letting the page go.
>    *
> - * The vmstat statistic corresponding to the list on which the page was
> - * found will be decremented.
> - *
> - * Restrictions:
> + * Context:
>    *
>    * (1) Must be called with an elevated refcount on the page. This is a
> - *     fundamental difference from isolate_lru_pages (which is called
> + *     fundamental difference from isolate_lru_pages() (which is called
>    *     without a stable reference).
> - * (2) the lru_lock must not be held.
> - * (3) interrupts must be enabled.
> + * (2) The lru_lock must not be held.
> + * (3) Interrupts must be enabled.
> + *
> + * Return: 0 if the folio was removed from an LRU list.
> + * -EBUSY if the folio was not on an LRU list.
>    */
> -int isolate_lru_page(struct page *page)
> +int isolate_lru_folio(struct folio *folio)
>   {
> -	struct folio *folio = page_folio(page);
>   	int ret = -EBUSY;
>   
> -	VM_BUG_ON_PAGE(!page_count(page), page);
> -	WARN_RATELIMIT(PageTail(page), "trying to isolate tail page");
> +	VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio);
>   
> -	if (TestClearPageLRU(page)) {
> +	if (folio_test_clear_lru(folio)) {
>   		struct lruvec *lruvec;
>   
> -		get_page(page);
> +		folio_get(folio);
>   		lruvec = folio_lruvec_lock_irq(folio);
> -		del_page_from_lru_list(page, lruvec);
> +		lruvec_del_folio(lruvec, folio);
>   		unlock_page_lruvec_irq(lruvec);
>   		ret = 0;
>   	}



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 17/17] gup: Convert check_and_migrate_movable_pages() to use a folio
  2022-01-02 21:57 ` [PATCH 17/17] gup: Convert check_and_migrate_movable_pages() to use a folio Matthew Wilcox (Oracle)
  2022-01-04  8:37   ` Christoph Hellwig
@ 2022-01-05  9:00   ` John Hubbard
  1 sibling, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-05  9:00 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm; +Cc: Andrew Morton

On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> Switch from head pages to folios.  This removes an assumption that
> THPs are the only way to have a high-order page.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 28 ++++++++++++++--------------
>   1 file changed, 14 insertions(+), 14 deletions(-)
> 

Reviewed-by: John Hubbard <jhubbard@nvidia.com>


thanks,
-- 
John Hubbard
NVIDIA

> diff --git a/mm/gup.c b/mm/gup.c
> index 76717e05413d..eb7c66e2b785 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1822,41 +1822,41 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>   	bool drain_allow = true;
>   	LIST_HEAD(movable_page_list);
>   	long ret = 0;
> -	struct page *prev_head = NULL;
> -	struct page *head;
> +	struct folio *folio, *prev_folio = NULL;
>   	struct migration_target_control mtc = {
>   		.nid = NUMA_NO_NODE,
>   		.gfp_mask = GFP_USER | __GFP_NOWARN,
>   	};
>   
>   	for (i = 0; i < nr_pages; i++) {
> -		head = compound_head(pages[i]);
> -		if (head == prev_head)
> +		folio = page_folio(pages[i]);
> +		if (folio == prev_folio)
>   			continue;
> -		prev_head = head;
> +		prev_folio = folio;
>   		/*
>   		 * If we get a movable page, since we are going to be pinning
>   		 * these entries, try to move them out if possible.
>   		 */
> -		if (!is_pinnable_page(head)) {
> -			if (PageHuge(head)) {
> -				if (!isolate_huge_page(head, &movable_page_list))
> +		if (!is_pinnable_page(&folio->page)) {
> +			if (folio_test_hugetlb(folio)) {
> +				if (!isolate_huge_page(&folio->page,
> +							&movable_page_list))
>   					isolation_error_count++;
>   			} else {
> -				if (!PageLRU(head) && drain_allow) {
> +				if (!folio_test_lru(folio) && drain_allow) {
>   					lru_add_drain_all();
>   					drain_allow = false;
>   				}
>   
> -				if (isolate_lru_page(head)) {
> +				if (isolate_lru_folio(folio)) {
>   					isolation_error_count++;
>   					continue;
>   				}
> -				list_add_tail(&head->lru, &movable_page_list);
> -				mod_node_page_state(page_pgdat(head),
> +				list_add_tail(&folio->lru, &movable_page_list);
> +				node_stat_mod_folio(folio,
>   						    NR_ISOLATED_ANON +
> -						    page_is_file_lru(head),
> -						    thp_nr_pages(head));
> +						    folio_is_file_lru(folio),
> +						    folio_nr_pages(folio));
>   			}
>   		}
>   	}



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 16/17] mm: Add isolate_lru_folio()
  2022-01-05  8:44   ` John Hubbard
@ 2022-01-06  0:34     ` Matthew Wilcox
  0 siblings, 0 replies; 72+ messages in thread
From: Matthew Wilcox @ 2022-01-06  0:34 UTC (permalink / raw)
  To: John Hubbard; +Cc: linux-mm, Andrew Morton

On Wed, Jan 05, 2022 at 12:44:38AM -0800, John Hubbard wrote:
> On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> > +int isolate_lru_page(struct page *page)
> > +{
> > +	if (WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"))
> > +		return -EBUSY;
> > +	return isolate_lru_folio((struct folio *)page);
> 
> This cast is not great to have, but it is correct, given the above
> constraints about tail pages...and I expect this sort of thing is
> temporary, anyway?

We could call page_folio() here, but we already know the answer.
The whole of folio-compat.c is temporary.  Any time I have the luxury
of boredom, grepping for any of the functions in folio-compat will
show something that needs to be converted to use folios.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 03/17] mm: Add folio_pincount_ptr()
  2022-01-02 21:57 ` [PATCH 03/17] mm: Add folio_pincount_ptr() Matthew Wilcox (Oracle)
  2022-01-04  8:02   ` Christoph Hellwig
  2022-01-04 21:43   ` John Hubbard
@ 2022-01-06 21:57   ` William Kucharski
  2 siblings, 0 replies; 72+ messages in thread
From: William Kucharski @ 2022-01-06 21:57 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

Comment inline.

> On Jan 2, 2022, at 2:57 PM, Matthew Wilcox (Oracle) <willy@infradead.org> wrote:
> 
> This is the folio equivalent of compound_pincount_ptr().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> include/linux/mm_types.h | 6 ++++++
> 1 file changed, 6 insertions(+)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index c3a6e6209600..09d9e2c4a2c5 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -309,6 +309,12 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page)
> 	return &page[1].compound_mapcount;
> }
> 
> +static inline atomic_t *folio_pincount_ptr(struct folio *folio)
> +{
> +	struct page *tail = &folio->page + 2;

Please add a comment explaining why this is "2."

> +	return &tail->hpage_pinned_refcount;
> +}
> +
> static inline atomic_t *compound_pincount_ptr(struct page *page)
> {
> 	return &page[2].hpage_pinned_refcount;
> -- 
> 2.33.0
> 



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 07/17] gup: Add gup_put_folio()
  2022-01-02 21:57 ` [PATCH 07/17] gup: Add gup_put_folio() Matthew Wilcox (Oracle)
  2022-01-04  8:22   ` Christoph Hellwig
  2022-01-05  6:52   ` John Hubbard
@ 2022-01-06 22:05   ` William Kucharski
  2 siblings, 0 replies; 72+ messages in thread
From: William Kucharski @ 2022-01-06 22:05 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

Comment inline.

> On Jan 2, 2022, at 2:57 PM, Matthew Wilcox (Oracle) <willy@infradead.org> wrote:
> 
> put_compound_head() is turned into a call to gup_put_folio().
> This removes the last call to put_page_refs(), so delete it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> mm/gup.c | 44 +++++++++++++++-----------------------------
> 1 file changed, 15 insertions(+), 29 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 58e5cfaaa676..6d827f7d66d8 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -36,29 +36,11 @@ static void folio_pincount_add(struct folio *folio, int refs)
> 	atomic_add(refs, folio_pincount_ptr(folio));
> }
> 
> -static void hpage_pincount_sub(struct page *page, int refs)
> +static void folio_pincount_sub(struct folio *folio, int refs)
> {
> -	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
> -	VM_BUG_ON_PAGE(page != compound_head(page), page);
> -
> -	atomic_sub(refs, compound_pincount_ptr(page));
> -}
> -
> -/* Equivalent to calling put_page() @refs times. */
> -static void put_page_refs(struct page *page, int refs)
> -{
> -#ifdef CONFIG_DEBUG_VM
> -	if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
> -		return;
> -#endif

Should this retain a form of the CONFIG_DEBUG_VM warning here to trigger if
folio_pincount_ptr(folio) is < refs?

> +	VM_BUG_ON_FOLIO(!folio_pincount_available(folio), folio);
> -	/*
> -	 * Calling put_page() for each ref is unnecessarily slow. Only the last
> -	 * ref needs a put_page().
> -	 */
> -	if (refs > 1)
> -		page_ref_sub(page, refs - 1);
> -	put_page(page);
> +	atomic_sub(refs, folio_pincount_ptr(folio));
> }
> 
> /*
> @@ -175,19 +157,23 @@ struct page *try_grab_compound_head(struct page *page,
> 	return NULL;
> }
> 
> -static void put_compound_head(struct page *page, int refs, unsigned int flags)
> +static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
> {
> 	if (flags & FOLL_PIN) {
> -		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED,
> -				    refs);
> -
> -		if (hpage_pincount_available(page))
> -			hpage_pincount_sub(page, refs);
> +		node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
> +		if (folio_pincount_available(folio))
> +			folio_pincount_sub(folio, refs);
> 		else
> 			refs *= GUP_PIN_COUNTING_BIAS;
> 	}
> 
> -	put_page_refs(page, refs);
> +	folio_put_refs(folio, refs);
> +}
> +
> +static void put_compound_head(struct page *page, int refs, unsigned int flags)
> +{
> +	VM_BUG_ON_PAGE(PageTail(page), page);
> +	gup_put_folio((struct folio *)page, refs, flags);
> }
> 
> /**
> @@ -228,7 +214,7 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags)
>  */
> void unpin_user_page(struct page *page)
> {
> -	put_compound_head(compound_head(page), 1, FOLL_PIN);
> +	gup_put_folio(page_folio(page), 1, FOLL_PIN);
> }
> EXPORT_SYMBOL(unpin_user_page);
> 
> -- 
> 2.33.0
> 
> 



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 00/17] Convert GUP to folios
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (16 preceding siblings ...)
  2022-01-02 21:57 ` [PATCH 17/17] gup: Convert check_and_migrate_movable_pages() to use a folio Matthew Wilcox (Oracle)
@ 2022-01-06 22:12 ` William Kucharski
  2022-01-07 18:54 ` Jason Gunthorpe
  18 siblings, 0 replies; 72+ messages in thread
From: William Kucharski @ 2022-01-06 22:12 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

For the series:

Reviewed-by: William Kucharski <william.kucharski@oracle.com>

> On Jan 2, 2022, at 2:57 PM, Matthew Wilcox (Oracle) <willy@infradead.org> wrote:
> 
> This patch series is against my current folio for-next branch.  I know
> it won't apply to sfr's next tree, and it's not for-next material yet.
> I intend to submit it for 5.18 after I've rebased it to one of the
> 5.17-rc releases.
> 
> The overall effect of this (ignoring the primary "preparing for folios
> that are not PAGE or PMD sized" purpose) is to reduce the size of gup.o
> by ~700 bytes in the config I normally test with.
> 
> This patchset just converts existing implementations to use folios.
> There's no new API for consumers here to provide information in a more
> efficient (address, length) format.  That will be a separate patchset.
> 
> Matthew Wilcox (Oracle) (17):
>  mm: Add folio_put_refs()
>  mm: Add folio_pincount_available()
>  mm: Add folio_pincount_ptr()
>  mm: Convert page_maybe_dma_pinned() to use a folio
>  gup: Add try_get_folio()
>  mm: Remove page_cache_add_speculative() and
>    page_cache_get_speculative()
>  gup: Add gup_put_folio()
>  gup: Add try_grab_folio()
>  gup: Convert gup_pte_range() to use a folio
>  gup: Convert gup_hugepte() to use a folio
>  gup: Convert gup_huge_pmd() to use a folio
>  gup: Convert gup_huge_pud() to use a folio
>  gup: Convert gup_huge_pgd() to use a folio
>  gup: Convert for_each_compound_head() to gup_for_each_folio()
>  gup: Convert for_each_compound_range() to gup_for_each_folio_range()
>  mm: Add isolate_lru_folio()
>  gup: Convert check_and_migrate_movable_pages() to use a folio
> 
> arch/powerpc/include/asm/mmu_context.h |   1 -
> include/linux/mm.h                     |  58 +++--
> include/linux/mm_types.h               |   6 +
> include/linux/pagemap.h                |  11 -
> mm/folio-compat.c                      |   8 +
> mm/gup.c                               | 312 ++++++++++++-------------
> mm/hugetlb.c                           |   7 +-
> mm/internal.h                          |   3 +-
> mm/vmscan.c                            |  43 ++--
> 9 files changed, 222 insertions(+), 227 deletions(-)
> 
> -- 
> 2.33.0
> 
> 



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 05/17] gup: Add try_get_folio()
  2022-01-05  1:25   ` John Hubbard
  2022-01-05  7:00     ` John Hubbard
@ 2022-01-07 18:23     ` Jason Gunthorpe
  2022-01-08  1:37     ` Matthew Wilcox
  2 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-01-07 18:23 UTC (permalink / raw)
  To: John Hubbard; +Cc: Matthew Wilcox (Oracle), linux-mm, Andrew Morton

On Tue, Jan 04, 2022 at 05:25:07PM -0800, John Hubbard wrote:

> > +	folio = page_folio(page);
> > +	if (WARN_ON_ONCE(folio_ref_count(folio) < 0))
> >   		return NULL;
> > -	if (unlikely(!page_cache_add_speculative(head, refs)))
> > +	if (unlikely(!folio_ref_try_add_rcu(folio, refs)))
> 
> I'm a little lost about the meaning and intended use of the _rcu aspects
> of folio_ref_try_add_rcu() here. For example, try_get_folio() does not
> require that callers are in an rcu read section, right? This is probably
> just a documentation question, sorry if it's obvious--I wasn't able to
> work it out on my own.

Normally this is called under gup_fast, so it is operating under the
quasi-rcu it uses. I have no idea about preempt kernels if this should
be marked with rcu or if the local_irq_save is good enough..

The exceptional cases are get_gate_page() which I suppose relies on
the mmap lock to ensure that the special gate page cannot be
concurrently freed.

Then there are the cases under slow GUP (eg follow_page_pte()) which
don't need "try" at all because they hold the PTLs so it is guarenteed
the refcount should be non-zero AFAIK..

It wouldn't be a bad idea to have a non-try version of this just for
clarity, could it peform a bit better?

		/*
		 * try_grab_page() should always succeed here, because: a) we
		 * hold the pmd (ptl) lock, and b) we've just checked that the
		 * huge pmd (head) page is present in the page tables. The ptl
		 * prevents the head page and tail pages from being rearranged
		 * in any way. So this page must be available at this point,
		 * unless the page refcount overflowed:
		 */
		if (WARN_ON_ONCE(!try_grab_page(page, flags))) {

Jason


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 00/17] Convert GUP to folios
  2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (17 preceding siblings ...)
  2022-01-06 22:12 ` [PATCH 00/17] Convert GUP to folios William Kucharski
@ 2022-01-07 18:54 ` Jason Gunthorpe
  18 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-01-07 18:54 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, John Hubbard, Andrew Morton

On Sun, Jan 02, 2022 at 09:57:12PM +0000, Matthew Wilcox (Oracle) wrote:
> This patch series is against my current folio for-next branch.  I know
> it won't apply to sfr's next tree, and it's not for-next material yet.
> I intend to submit it for 5.18 after I've rebased it to one of the
> 5.17-rc releases.
> 
> The overall effect of this (ignoring the primary "preparing for folios
> that are not PAGE or PMD sized" purpose) is to reduce the size of gup.o
> by ~700 bytes in the config I normally test with.
> 
> This patchset just converts existing implementations to use folios.
> There's no new API for consumers here to provide information in a more
> efficient (address, length) format.  That will be a separate patchset.
> 
> Matthew Wilcox (Oracle) (17):
>   mm: Add folio_put_refs()
>   mm: Add folio_pincount_available()
>   mm: Add folio_pincount_ptr()
>   mm: Convert page_maybe_dma_pinned() to use a folio
>   gup: Add try_get_folio()
>   mm: Remove page_cache_add_speculative() and
>     page_cache_get_speculative()
>   gup: Add gup_put_folio()
>   gup: Add try_grab_folio()
>   gup: Convert gup_pte_range() to use a folio
>   gup: Convert gup_hugepte() to use a folio
>   gup: Convert gup_huge_pmd() to use a folio
>   gup: Convert gup_huge_pud() to use a folio
>   gup: Convert gup_huge_pgd() to use a folio
>   gup: Convert for_each_compound_head() to gup_for_each_folio()
>   gup: Convert for_each_compound_range() to gup_for_each_folio_range()
>   mm: Add isolate_lru_folio()
>   gup: Convert check_and_migrate_movable_pages() to use a folio

Looked fine to me

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Thanks,
Jason


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 05/17] gup: Add try_get_folio()
  2022-01-05  1:25   ` John Hubbard
  2022-01-05  7:00     ` John Hubbard
  2022-01-07 18:23     ` Jason Gunthorpe
@ 2022-01-08  1:37     ` Matthew Wilcox
  2022-01-08  2:36       ` John Hubbard
  2022-01-10 15:01       ` Jason Gunthorpe
  2 siblings, 2 replies; 72+ messages in thread
From: Matthew Wilcox @ 2022-01-08  1:37 UTC (permalink / raw)
  To: John Hubbard; +Cc: linux-mm, Andrew Morton

On Tue, Jan 04, 2022 at 05:25:07PM -0800, John Hubbard wrote:
> On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
> > +	folio = page_folio(page);
> > +	if (WARN_ON_ONCE(folio_ref_count(folio) < 0))
> >   		return NULL;
> > -	if (unlikely(!page_cache_add_speculative(head, refs)))
> > +	if (unlikely(!folio_ref_try_add_rcu(folio, refs)))
> 
> I'm a little lost about the meaning and intended use of the _rcu aspects
> of folio_ref_try_add_rcu() here. For example, try_get_folio() does not
> require that callers are in an rcu read section, right? This is probably
> just a documentation question, sorry if it's obvious--I wasn't able to
> work it out on my own.

page_cache_add_speculative() always assumed that you were at least as
protected as RCU.  Quoting v5.4's pagemap.h:

 * speculatively take a reference to a page.
 * If the page is free (_refcount == 0), then _refcount is untouched, and 0
 * is returned. Otherwise, _refcount is incremented by 1 and 1 is returned.
 *
 * This function must be called inside the same rcu_read_lock() section as has
 * been used to lookup the page in the pagecache radix-tree (or page table):
 * this allows allocators to use a synchronize_rcu() to stabilize _refcount.

... so is it the addition of "rcu" in the name that's scared you?
If so, that seems like a good thing, because previously you were using
page_cache_add_speculative() without realising that RCU protection
was required.

I think there is sufficient protection; either we have interrupts disabled
in gup-fast (which prevents RCU from finishing a grace period), or we
have the mmap_sem held in gup-slow (which prevents pages from being
removed from the page tables).

But maybe I've missed something?


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 05/17] gup: Add try_get_folio()
  2022-01-08  1:37     ` Matthew Wilcox
@ 2022-01-08  2:36       ` John Hubbard
  2022-01-10 15:01       ` Jason Gunthorpe
  1 sibling, 0 replies; 72+ messages in thread
From: John Hubbard @ 2022-01-08  2:36 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, Andrew Morton

On 1/7/22 17:37, Matthew Wilcox wrote:
> On Tue, Jan 04, 2022 at 05:25:07PM -0800, John Hubbard wrote:
>> On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote:
>>> +	folio = page_folio(page);
>>> +	if (WARN_ON_ONCE(folio_ref_count(folio) < 0))
>>>    		return NULL;
>>> -	if (unlikely(!page_cache_add_speculative(head, refs)))
>>> +	if (unlikely(!folio_ref_try_add_rcu(folio, refs)))
>>
>> I'm a little lost about the meaning and intended use of the _rcu aspects
>> of folio_ref_try_add_rcu() here. For example, try_get_folio() does not
>> require that callers are in an rcu read section, right? This is probably
>> just a documentation question, sorry if it's obvious--I wasn't able to
>> work it out on my own.
> 
> page_cache_add_speculative() always assumed that you were at least as
> protected as RCU.  Quoting v5.4's pagemap.h:
> 
>   * speculatively take a reference to a page.
>   * If the page is free (_refcount == 0), then _refcount is untouched, and 0
>   * is returned. Otherwise, _refcount is incremented by 1 and 1 is returned.
>   *
>   * This function must be called inside the same rcu_read_lock() section as has
>   * been used to lookup the page in the pagecache radix-tree (or page table):
>   * this allows allocators to use a synchronize_rcu() to stabilize _refcount.

Thanks for the v5.4 documentation reference. I did read and understand that a
while back, but it didn't show up at all when I was looking through the code
during this review, so I missed a chance to re-read it. I'd forgotten about the
RCU qualifier there.

> 
> ... so is it the addition of "rcu" in the name that's scared you?
> If so, that seems like a good thing, because previously you were using
> page_cache_add_speculative() without realising that RCU protection
> was required.
> 
> I think there is sufficient protection; either we have interrupts disabled
> in gup-fast (which prevents RCU from finishing a grace period), or we
> have the mmap_sem held in gup-slow (which prevents pages from being
> removed from the page tables).
> 
> But maybe I've missed something?

Clearly, I'm not the right person to answer that question, at this point. :)

FWIW, I went back and looked again, and it seems like everything is covered.

On the documentation front, I do miss the v5.4 helpful comment block above
__page_cache_add_speculative(), it would be nice to have an updated version
of that revived, but of course that's a separate point from this patchset.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio()
  2022-01-05  8:17   ` John Hubbard
@ 2022-01-09  4:39     ` Matthew Wilcox
  2022-01-09  8:01       ` John Hubbard
  0 siblings, 1 reply; 72+ messages in thread
From: Matthew Wilcox @ 2022-01-09  4:39 UTC (permalink / raw)
  To: John Hubbard; +Cc: linux-mm, Andrew Morton

On Wed, Jan 05, 2022 at 12:17:46AM -0800, John Hubbard wrote:
> > +		if (!folio_test_dirty(folio)) {
> > +			folio_lock(folio);
> > +			folio_mark_dirty(folio);
> > +			folio_unlock(folio);
> 
> At some point, maybe even here, I suspect that creating the folio
> version of set_page_dirty_lock() would help. I'm sure you have
> a better feel for whether it helps, after doing all of this conversion
> work, but it just sort of jumped out at me as surprising to see it
> in this form.

I really hate set_page_dirty_lock().  It smacks of "there is a locking
rule here which we're violating, so we'll just take the lock to fix it"
without understanding why there's a locking problem here.

As far as I can tell, originally, the intent was that you would lock
the page before modifying any of the data in the page.  ie you would
do:

	gup()
	lock_page()
	addr = kmap_page()
	*addr = 1;
	kunmap_page()
	set_page_dirty()
	unlock_page()
	put_page()

and that would prevent races between modifying the page and (starting)
writeback, not to mention truncate() and various other operations.

Clearly we can't do that for DMA-pinned pages.  There's only one lock
bit.  But do we even need to take the lock if we have the page pinned?
What are we protecting against?

If it's truncate(), I think we must already have protection against
that as we could easily have:

	pup()
				truncate()
	lock_page()
	set_page_dirty()
	unlock_page()
	unpin



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio()
  2022-01-09  4:39     ` Matthew Wilcox
@ 2022-01-09  8:01       ` John Hubbard
  2022-01-10 15:22         ` Jan Kara
  0 siblings, 1 reply; 72+ messages in thread
From: John Hubbard @ 2022-01-09  8:01 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, Andrew Morton, Christoph Hellwig, Jan Kara

On 1/8/22 20:39, Matthew Wilcox wrote:
> On Wed, Jan 05, 2022 at 12:17:46AM -0800, John Hubbard wrote:
>>> +		if (!folio_test_dirty(folio)) {
>>> +			folio_lock(folio);
>>> +			folio_mark_dirty(folio);
>>> +			folio_unlock(folio);
>>
>> At some point, maybe even here, I suspect that creating the folio
>> version of set_page_dirty_lock() would help. I'm sure you have
>> a better feel for whether it helps, after doing all of this conversion
>> work, but it just sort of jumped out at me as surprising to see it
>> in this form.
> 
> I really hate set_page_dirty_lock().  It smacks of "there is a locking
> rule here which we're violating, so we'll just take the lock to fix it"
> without understanding why there's a locking problem here.
> 
> As far as I can tell, originally, the intent was that you would lock
> the page before modifying any of the data in the page.  ie you would
> do:
> 
> 	gup()
> 	lock_page()
> 	addr = kmap_page()
> 	*addr = 1;
> 	kunmap_page()
> 	set_page_dirty()
> 	unlock_page()
> 	put_page()
> 
> and that would prevent races between modifying the page and (starting)
> writeback, not to mention truncate() and various other operations.
> 
> Clearly we can't do that for DMA-pinned pages.  There's only one lock
> bit.  But do we even need to take the lock if we have the page pinned?
> What are we protecting against?

This is a fun question, because you're asking it at a point when the
overall problem remains unsolved. That is, the interaction between
file-backed pages and gup/pup is still completely broken.

And I don't have an answer for you: it does seem like lock_page() is
completely pointless here. Looking back, there are some 25 callers of
unpin_user_pages_dirty_lock(), and during all those patch reviews, no
one noticed this point!

Anyway...so maybe most or even all of the unpin_user_pages_dirty_lock()
callers do not really need a _lock variant, after all.

Or, maybe it really is required and we're overlooking some subtle
filesystem-related point. It would be nice to get a second look from
Christoph Hellwig and Jan Kara (+CC).

> 
> If it's truncate(), I think we must already have protection against
> that as we could easily have:
> 
> 	pup()
> 				truncate()
> 	lock_page()
> 	set_page_dirty()
> 	unlock_page()
> 	unpin
> 

I think truncate vs. gup/pup() is no more and and no less broken in this
regard than it's ever been. We need another LSF/MM conference with some
more shouting, and/or file leases, for that. :)


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 05/17] gup: Add try_get_folio()
  2022-01-08  1:37     ` Matthew Wilcox
  2022-01-08  2:36       ` John Hubbard
@ 2022-01-10 15:01       ` Jason Gunthorpe
  1 sibling, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-01-10 15:01 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: John Hubbard, linux-mm, Andrew Morton

On Sat, Jan 08, 2022 at 01:37:09AM +0000, Matthew Wilcox wrote:

> I think there is sufficient protection; either we have interrupts
> disabled in gup-fast (which prevents RCU from finishing a grace
> period), or we have the mmap_sem held in gup-slow (which prevents
> pages from being removed from the page tables).

mmap sem doesn't prevent pages from being removed, for instance
unmap_mapping_pages() doesn't hold it.

It is safe because gup slow holds the PTLs when it calls this.

No idea about how the get_gate_page() works

Jason


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio()
  2022-01-09  8:01       ` John Hubbard
@ 2022-01-10 15:22         ` Jan Kara
  2022-01-10 15:52           ` Matthew Wilcox
  0 siblings, 1 reply; 72+ messages in thread
From: Jan Kara @ 2022-01-10 15:22 UTC (permalink / raw)
  To: John Hubbard
  Cc: Matthew Wilcox, linux-mm, Andrew Morton, Christoph Hellwig, Jan Kara

On Sun 09-01-22 00:01:49, John Hubbard wrote:
> On 1/8/22 20:39, Matthew Wilcox wrote:
> > On Wed, Jan 05, 2022 at 12:17:46AM -0800, John Hubbard wrote:
> > > > +		if (!folio_test_dirty(folio)) {
> > > > +			folio_lock(folio);
> > > > +			folio_mark_dirty(folio);
> > > > +			folio_unlock(folio);
> > > 
> > > At some point, maybe even here, I suspect that creating the folio
> > > version of set_page_dirty_lock() would help. I'm sure you have
> > > a better feel for whether it helps, after doing all of this conversion
> > > work, but it just sort of jumped out at me as surprising to see it
> > > in this form.
> > 
> > I really hate set_page_dirty_lock().  It smacks of "there is a locking
> > rule here which we're violating, so we'll just take the lock to fix it"
> > without understanding why there's a locking problem here.
> > 
> > As far as I can tell, originally, the intent was that you would lock
> > the page before modifying any of the data in the page.  ie you would
> > do:
> > 
> > 	gup()
> > 	lock_page()
> > 	addr = kmap_page()
> > 	*addr = 1;
> > 	kunmap_page()
> > 	set_page_dirty()
> > 	unlock_page()
> > 	put_page()
> > 
> > and that would prevent races between modifying the page and (starting)
> > writeback, not to mention truncate() and various other operations.
> > 
> > Clearly we can't do that for DMA-pinned pages.  There's only one lock
> > bit.  But do we even need to take the lock if we have the page pinned?
> > What are we protecting against?
> 
> This is a fun question, because you're asking it at a point when the
> overall problem remains unsolved. That is, the interaction between
> file-backed pages and gup/pup is still completely broken.
> 
> And I don't have an answer for you: it does seem like lock_page() is
> completely pointless here. Looking back, there are some 25 callers of
> unpin_user_pages_dirty_lock(), and during all those patch reviews, no
> one noticed this point!

I'd say it is underdocumented but not obviously pointless :) AFAIR (and
Christoph or Andrew may well correct me) the page lock in
set_page_dirty_lock() is there to protect metadata associated with the page
through page->private. Otherwise truncate could free these (e.g.
block_invalidatepage()) while ->set_page_dirty() callback (e.g.
__set_page_dirty_buffers()) works on this metadata.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio()
  2022-01-10 15:22         ` Jan Kara
@ 2022-01-10 15:52           ` Matthew Wilcox
  2022-01-10 20:36             ` Jan Kara
  0 siblings, 1 reply; 72+ messages in thread
From: Matthew Wilcox @ 2022-01-10 15:52 UTC (permalink / raw)
  To: Jan Kara; +Cc: John Hubbard, linux-mm, Andrew Morton, Christoph Hellwig

On Mon, Jan 10, 2022 at 04:22:08PM +0100, Jan Kara wrote:
> On Sun 09-01-22 00:01:49, John Hubbard wrote:
> > On 1/8/22 20:39, Matthew Wilcox wrote:
> > > On Wed, Jan 05, 2022 at 12:17:46AM -0800, John Hubbard wrote:
> > > > > +		if (!folio_test_dirty(folio)) {
> > > > > +			folio_lock(folio);
> > > > > +			folio_mark_dirty(folio);
> > > > > +			folio_unlock(folio);
> > > > 
> > > > At some point, maybe even here, I suspect that creating the folio
> > > > version of set_page_dirty_lock() would help. I'm sure you have
> > > > a better feel for whether it helps, after doing all of this conversion
> > > > work, but it just sort of jumped out at me as surprising to see it
> > > > in this form.
> > > 
> > > I really hate set_page_dirty_lock().  It smacks of "there is a locking
> > > rule here which we're violating, so we'll just take the lock to fix it"
> > > without understanding why there's a locking problem here.
> > > 
> > > As far as I can tell, originally, the intent was that you would lock
> > > the page before modifying any of the data in the page.  ie you would
> > > do:
> > > 
> > > 	gup()
> > > 	lock_page()
> > > 	addr = kmap_page()
> > > 	*addr = 1;
> > > 	kunmap_page()
> > > 	set_page_dirty()
> > > 	unlock_page()
> > > 	put_page()
> > > 
> > > and that would prevent races between modifying the page and (starting)
> > > writeback, not to mention truncate() and various other operations.
> > > 
> > > Clearly we can't do that for DMA-pinned pages.  There's only one lock
> > > bit.  But do we even need to take the lock if we have the page pinned?
> > > What are we protecting against?
> > 
> > This is a fun question, because you're asking it at a point when the
> > overall problem remains unsolved. That is, the interaction between
> > file-backed pages and gup/pup is still completely broken.
> > 
> > And I don't have an answer for you: it does seem like lock_page() is
> > completely pointless here. Looking back, there are some 25 callers of
> > unpin_user_pages_dirty_lock(), and during all those patch reviews, no
> > one noticed this point!
> 
> I'd say it is underdocumented but not obviously pointless :) AFAIR (and
> Christoph or Andrew may well correct me) the page lock in
> set_page_dirty_lock() is there to protect metadata associated with the page
> through page->private. Otherwise truncate could free these (e.g.
> block_invalidatepage()) while ->set_page_dirty() callback (e.g.
> __set_page_dirty_buffers()) works on this metadata.

Yes, but ... we have an inconsistency between DMA writes to the page and
CPU writes to the page.

	fd = open(file)
	write(fd, 1024 * 1024)
	mmap(NULL, 1024 * 1024, PROT_RW, MAP_SHARED, fd, 0)
	register-memory-with-RDMA
	ftruncate(fd, 0);	// page is removed from page cache
	ftruncate(fd, 1024 * 1024) 

Now if we do a store from the CPU, we instantiate a new page in the
page cache and the store will be written back to the file.  If we do
an RDMA-write, the write goes to the old page and will be lost.  Indeed,
it's no longer visible to the CPU (but is visible to other RDMA reads!)

Which is fine if the program did it itself because it's doing something
clearly bonkers, but another program might be the one doing the
two truncate() steps, and this would surprise an innocent program.

I still favour blocking the truncate-down (or holepunch) until there
are no pinned pages in the inode.  But I know this is a change in
behaviour since for some reason, truncate() gets to override mmap().


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio()
  2022-01-10 15:52           ` Matthew Wilcox
@ 2022-01-10 20:36             ` Jan Kara
  2022-01-10 21:10               ` Matthew Wilcox
  0 siblings, 1 reply; 72+ messages in thread
From: Jan Kara @ 2022-01-10 20:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jan Kara, John Hubbard, linux-mm, Andrew Morton, Christoph Hellwig

On Mon 10-01-22 15:52:51, Matthew Wilcox wrote:
> On Mon, Jan 10, 2022 at 04:22:08PM +0100, Jan Kara wrote:
> > On Sun 09-01-22 00:01:49, John Hubbard wrote:
> > > On 1/8/22 20:39, Matthew Wilcox wrote:
> > > > On Wed, Jan 05, 2022 at 12:17:46AM -0800, John Hubbard wrote:
> > > > > > +		if (!folio_test_dirty(folio)) {
> > > > > > +			folio_lock(folio);
> > > > > > +			folio_mark_dirty(folio);
> > > > > > +			folio_unlock(folio);
> > > > > 
> > > > > At some point, maybe even here, I suspect that creating the folio
> > > > > version of set_page_dirty_lock() would help. I'm sure you have
> > > > > a better feel for whether it helps, after doing all of this conversion
> > > > > work, but it just sort of jumped out at me as surprising to see it
> > > > > in this form.
> > > > 
> > > > I really hate set_page_dirty_lock().  It smacks of "there is a locking
> > > > rule here which we're violating, so we'll just take the lock to fix it"
> > > > without understanding why there's a locking problem here.
> > > > 
> > > > As far as I can tell, originally, the intent was that you would lock
> > > > the page before modifying any of the data in the page.  ie you would
> > > > do:
> > > > 
> > > > 	gup()
> > > > 	lock_page()
> > > > 	addr = kmap_page()
> > > > 	*addr = 1;
> > > > 	kunmap_page()
> > > > 	set_page_dirty()
> > > > 	unlock_page()
> > > > 	put_page()
> > > > 
> > > > and that would prevent races between modifying the page and (starting)
> > > > writeback, not to mention truncate() and various other operations.
> > > > 
> > > > Clearly we can't do that for DMA-pinned pages.  There's only one lock
> > > > bit.  But do we even need to take the lock if we have the page pinned?
> > > > What are we protecting against?
> > > 
> > > This is a fun question, because you're asking it at a point when the
> > > overall problem remains unsolved. That is, the interaction between
> > > file-backed pages and gup/pup is still completely broken.
> > > 
> > > And I don't have an answer for you: it does seem like lock_page() is
> > > completely pointless here. Looking back, there are some 25 callers of
> > > unpin_user_pages_dirty_lock(), and during all those patch reviews, no
> > > one noticed this point!
> > 
> > I'd say it is underdocumented but not obviously pointless :) AFAIR (and
> > Christoph or Andrew may well correct me) the page lock in
> > set_page_dirty_lock() is there to protect metadata associated with the page
> > through page->private. Otherwise truncate could free these (e.g.
> > block_invalidatepage()) while ->set_page_dirty() callback (e.g.
> > __set_page_dirty_buffers()) works on this metadata.
> 
> Yes, but ... we have an inconsistency between DMA writes to the page and
> CPU writes to the page.
> 
> 	fd = open(file)
> 	write(fd, 1024 * 1024)
> 	mmap(NULL, 1024 * 1024, PROT_RW, MAP_SHARED, fd, 0)
> 	register-memory-with-RDMA
> 	ftruncate(fd, 0);	// page is removed from page cache
> 	ftruncate(fd, 1024 * 1024) 
> 
> Now if we do a store from the CPU, we instantiate a new page in the
> page cache and the store will be written back to the file.  If we do
> an RDMA-write, the write goes to the old page and will be lost.  Indeed,
> it's no longer visible to the CPU (but is visible to other RDMA reads!)
> 
> Which is fine if the program did it itself because it's doing something
> clearly bonkers, but another program might be the one doing the
> two truncate() steps, and this would surprise an innocent program.
> 
> I still favour blocking the truncate-down (or holepunch) until there
> are no pinned pages in the inode.  But I know this is a change in
> behaviour since for some reason, truncate() gets to override mmap().

I agree although this is unrelated to the page lock discussion above. In
principle we can consider such change (after all we chose this solution for
DAX) but it has some consequences - e.g. that disk space cannot be
reclaimed when someone has pagecache pages pinned (which may be unexpected
from sysadmin POV) or that we have to be careful or eager application
doing DIO (once it is converted to pinning) can block truncate
indefinitely.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio()
  2022-01-10 20:36             ` Jan Kara
@ 2022-01-10 21:10               ` Matthew Wilcox
  2022-01-17 12:07                 ` Jan Kara
  0 siblings, 1 reply; 72+ messages in thread
From: Matthew Wilcox @ 2022-01-10 21:10 UTC (permalink / raw)
  To: Jan Kara; +Cc: John Hubbard, linux-mm, Andrew Morton, Christoph Hellwig

On Mon, Jan 10, 2022 at 09:36:11PM +0100, Jan Kara wrote:
> On Mon 10-01-22 15:52:51, Matthew Wilcox wrote:
> > On Mon, Jan 10, 2022 at 04:22:08PM +0100, Jan Kara wrote:
> > > On Sun 09-01-22 00:01:49, John Hubbard wrote:
> > > > On 1/8/22 20:39, Matthew Wilcox wrote:
> > > > > On Wed, Jan 05, 2022 at 12:17:46AM -0800, John Hubbard wrote:
> > > > > > > +		if (!folio_test_dirty(folio)) {
> > > > > > > +			folio_lock(folio);
> > > > > > > +			folio_mark_dirty(folio);
> > > > > > > +			folio_unlock(folio);
> > > > > > 
> > > > > > At some point, maybe even here, I suspect that creating the folio
> > > > > > version of set_page_dirty_lock() would help. I'm sure you have
> > > > > > a better feel for whether it helps, after doing all of this conversion
> > > > > > work, but it just sort of jumped out at me as surprising to see it
> > > > > > in this form.
> > > > > 
> > > > > I really hate set_page_dirty_lock().  It smacks of "there is a locking
> > > > > rule here which we're violating, so we'll just take the lock to fix it"
> > > > > without understanding why there's a locking problem here.
> > > > > 
> > > > > As far as I can tell, originally, the intent was that you would lock
> > > > > the page before modifying any of the data in the page.  ie you would
> > > > > do:
> > > > > 
> > > > > 	gup()
> > > > > 	lock_page()
> > > > > 	addr = kmap_page()
> > > > > 	*addr = 1;
> > > > > 	kunmap_page()
> > > > > 	set_page_dirty()
> > > > > 	unlock_page()
> > > > > 	put_page()
> > > > > 
> > > > > and that would prevent races between modifying the page and (starting)
> > > > > writeback, not to mention truncate() and various other operations.
> > > > > 
> > > > > Clearly we can't do that for DMA-pinned pages.  There's only one lock
> > > > > bit.  But do we even need to take the lock if we have the page pinned?
> > > > > What are we protecting against?
> > > > 
> > > > This is a fun question, because you're asking it at a point when the
> > > > overall problem remains unsolved. That is, the interaction between
> > > > file-backed pages and gup/pup is still completely broken.
> > > > 
> > > > And I don't have an answer for you: it does seem like lock_page() is
> > > > completely pointless here. Looking back, there are some 25 callers of
> > > > unpin_user_pages_dirty_lock(), and during all those patch reviews, no
> > > > one noticed this point!
> > > 
> > > I'd say it is underdocumented but not obviously pointless :) AFAIR (and
> > > Christoph or Andrew may well correct me) the page lock in
> > > set_page_dirty_lock() is there to protect metadata associated with the page
> > > through page->private. Otherwise truncate could free these (e.g.
> > > block_invalidatepage()) while ->set_page_dirty() callback (e.g.
> > > __set_page_dirty_buffers()) works on this metadata.
> > 
> > Yes, but ... we have an inconsistency between DMA writes to the page and
> > CPU writes to the page.
> > 
> > 	fd = open(file)
> > 	write(fd, 1024 * 1024)
> > 	mmap(NULL, 1024 * 1024, PROT_RW, MAP_SHARED, fd, 0)
> > 	register-memory-with-RDMA
> > 	ftruncate(fd, 0);	// page is removed from page cache
> > 	ftruncate(fd, 1024 * 1024) 
> > 
> > Now if we do a store from the CPU, we instantiate a new page in the
> > page cache and the store will be written back to the file.  If we do
> > an RDMA-write, the write goes to the old page and will be lost.  Indeed,
> > it's no longer visible to the CPU (but is visible to other RDMA reads!)
> > 
> > Which is fine if the program did it itself because it's doing something
> > clearly bonkers, but another program might be the one doing the
> > two truncate() steps, and this would surprise an innocent program.
> > 
> > I still favour blocking the truncate-down (or holepunch) until there
> > are no pinned pages in the inode.  But I know this is a change in
> > behaviour since for some reason, truncate() gets to override mmap().
> 
> I agree although this is unrelated to the page lock discussion above. In
> principle we can consider such change (after all we chose this solution for
> DAX) but it has some consequences - e.g. that disk space cannot be
> reclaimed when someone has pagecache pages pinned (which may be unexpected
> from sysadmin POV) or that we have to be careful or eager application
> doing DIO (once it is converted to pinning) can block truncate
> indefinitely.

It's not unrelated ... once we figure out how to solve this problem,
the set_page_dirty() call happens while the page is still DMA-pinned,
so any solution can be applicable to both places.  Maybe the solution
to truncate vs DMA-pin won't be applicable to both ...

As far as badly behaved applications doing DMA-pinning blocking truncate()
goes, have we considered the possibility of declining the DMA pin if the
process does not own the mmaped file?  That would limit the amount of
trouble it can cause, but maybe it would break some interesting use cases.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio()
  2022-01-10 21:10               ` Matthew Wilcox
@ 2022-01-17 12:07                 ` Jan Kara
  0 siblings, 0 replies; 72+ messages in thread
From: Jan Kara @ 2022-01-17 12:07 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jan Kara, John Hubbard, linux-mm, Andrew Morton, Christoph Hellwig

On Mon 10-01-22 21:10:36, Matthew Wilcox wrote:
> On Mon, Jan 10, 2022 at 09:36:11PM +0100, Jan Kara wrote:
> > On Mon 10-01-22 15:52:51, Matthew Wilcox wrote:
> > > On Mon, Jan 10, 2022 at 04:22:08PM +0100, Jan Kara wrote:
> > > > On Sun 09-01-22 00:01:49, John Hubbard wrote:
> > > > > On 1/8/22 20:39, Matthew Wilcox wrote:
> > > > > > On Wed, Jan 05, 2022 at 12:17:46AM -0800, John Hubbard wrote:
> > > > > > > > +		if (!folio_test_dirty(folio)) {
> > > > > > > > +			folio_lock(folio);
> > > > > > > > +			folio_mark_dirty(folio);
> > > > > > > > +			folio_unlock(folio);
> > > > > > > 
> > > > > > > At some point, maybe even here, I suspect that creating the folio
> > > > > > > version of set_page_dirty_lock() would help. I'm sure you have
> > > > > > > a better feel for whether it helps, after doing all of this conversion
> > > > > > > work, but it just sort of jumped out at me as surprising to see it
> > > > > > > in this form.
> > > > > > 
> > > > > > I really hate set_page_dirty_lock().  It smacks of "there is a locking
> > > > > > rule here which we're violating, so we'll just take the lock to fix it"
> > > > > > without understanding why there's a locking problem here.
> > > > > > 
> > > > > > As far as I can tell, originally, the intent was that you would lock
> > > > > > the page before modifying any of the data in the page.  ie you would
> > > > > > do:
> > > > > > 
> > > > > > 	gup()
> > > > > > 	lock_page()
> > > > > > 	addr = kmap_page()
> > > > > > 	*addr = 1;
> > > > > > 	kunmap_page()
> > > > > > 	set_page_dirty()
> > > > > > 	unlock_page()
> > > > > > 	put_page()
> > > > > > 
> > > > > > and that would prevent races between modifying the page and (starting)
> > > > > > writeback, not to mention truncate() and various other operations.
> > > > > > 
> > > > > > Clearly we can't do that for DMA-pinned pages.  There's only one lock
> > > > > > bit.  But do we even need to take the lock if we have the page pinned?
> > > > > > What are we protecting against?
> > > > > 
> > > > > This is a fun question, because you're asking it at a point when the
> > > > > overall problem remains unsolved. That is, the interaction between
> > > > > file-backed pages and gup/pup is still completely broken.
> > > > > 
> > > > > And I don't have an answer for you: it does seem like lock_page() is
> > > > > completely pointless here. Looking back, there are some 25 callers of
> > > > > unpin_user_pages_dirty_lock(), and during all those patch reviews, no
> > > > > one noticed this point!
> > > > 
> > > > I'd say it is underdocumented but not obviously pointless :) AFAIR (and
> > > > Christoph or Andrew may well correct me) the page lock in
> > > > set_page_dirty_lock() is there to protect metadata associated with the page
> > > > through page->private. Otherwise truncate could free these (e.g.
> > > > block_invalidatepage()) while ->set_page_dirty() callback (e.g.
> > > > __set_page_dirty_buffers()) works on this metadata.
> > > 
> > > Yes, but ... we have an inconsistency between DMA writes to the page and
> > > CPU writes to the page.
> > > 
> > > 	fd = open(file)
> > > 	write(fd, 1024 * 1024)
> > > 	mmap(NULL, 1024 * 1024, PROT_RW, MAP_SHARED, fd, 0)
> > > 	register-memory-with-RDMA
> > > 	ftruncate(fd, 0);	// page is removed from page cache
> > > 	ftruncate(fd, 1024 * 1024) 
> > > 
> > > Now if we do a store from the CPU, we instantiate a new page in the
> > > page cache and the store will be written back to the file.  If we do
> > > an RDMA-write, the write goes to the old page and will be lost.  Indeed,
> > > it's no longer visible to the CPU (but is visible to other RDMA reads!)
> > > 
> > > Which is fine if the program did it itself because it's doing something
> > > clearly bonkers, but another program might be the one doing the
> > > two truncate() steps, and this would surprise an innocent program.
> > > 
> > > I still favour blocking the truncate-down (or holepunch) until there
> > > are no pinned pages in the inode.  But I know this is a change in
> > > behaviour since for some reason, truncate() gets to override mmap().
> > 
> > I agree although this is unrelated to the page lock discussion above. In
> > principle we can consider such change (after all we chose this solution for
> > DAX) but it has some consequences - e.g. that disk space cannot be
> > reclaimed when someone has pagecache pages pinned (which may be unexpected
> > from sysadmin POV) or that we have to be careful or eager application
> > doing DIO (once it is converted to pinning) can block truncate
> > indefinitely.
> 
> It's not unrelated ... once we figure out how to solve this problem,
> the set_page_dirty() call happens while the page is still DMA-pinned,
> so any solution can be applicable to both places.  Maybe the solution
> to truncate vs DMA-pin won't be applicable to both ...
> 
> As far as badly behaved applications doing DMA-pinning blocking truncate()
> goes, have we considered the possibility of declining the DMA pin if the
> process does not own the mmaped file?  That would limit the amount of
> trouble it can cause, but maybe it would break some interesting use cases.

Sorry for delayed reply, this fell through the cracks. IMO this isn't going
to fly.
1) Direct IO needs to use DMA pinning for its buffers and you cannot
regress that by changing required permissions.
2) Also requiring file ownership for such operation looks a bit weird from
userspace POV. Userspace is just using mmaped files, why should it require
extra priviledges?
3) It would be definitive goodbye to GUP-fast for pinning.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2022-01-17 12:07 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-02 21:57 [PATCH 00/17] Convert GUP to folios Matthew Wilcox (Oracle)
2022-01-02 21:57 ` [PATCH 01/17] mm: Add folio_put_refs() Matthew Wilcox (Oracle)
2022-01-04  8:00   ` Christoph Hellwig
2022-01-04 21:15   ` John Hubbard
2022-01-02 21:57 ` [PATCH 02/17] mm: Add folio_pincount_available() Matthew Wilcox (Oracle)
2022-01-04  8:01   ` Christoph Hellwig
2022-01-04 18:25     ` Matthew Wilcox
2022-01-04 21:40   ` John Hubbard
2022-01-05  5:04     ` Matthew Wilcox
2022-01-05  6:24       ` John Hubbard
2022-01-02 21:57 ` [PATCH 03/17] mm: Add folio_pincount_ptr() Matthew Wilcox (Oracle)
2022-01-04  8:02   ` Christoph Hellwig
2022-01-04 21:43   ` John Hubbard
2022-01-06 21:57   ` William Kucharski
2022-01-02 21:57 ` [PATCH 04/17] mm: Convert page_maybe_dma_pinned() to use a folio Matthew Wilcox (Oracle)
2022-01-04  8:03   ` Christoph Hellwig
2022-01-04 22:01   ` John Hubbard
2022-01-02 21:57 ` [PATCH 05/17] gup: Add try_get_folio() Matthew Wilcox (Oracle)
2022-01-04  8:18   ` Christoph Hellwig
2022-01-05  1:25   ` John Hubbard
2022-01-05  7:00     ` John Hubbard
2022-01-07 18:23     ` Jason Gunthorpe
2022-01-08  1:37     ` Matthew Wilcox
2022-01-08  2:36       ` John Hubbard
2022-01-10 15:01       ` Jason Gunthorpe
2022-01-02 21:57 ` [PATCH 06/17] mm: Remove page_cache_add_speculative() and page_cache_get_speculative() Matthew Wilcox (Oracle)
2022-01-04  8:18   ` Christoph Hellwig
2022-01-05  1:29   ` John Hubbard
2022-01-02 21:57 ` [PATCH 07/17] gup: Add gup_put_folio() Matthew Wilcox (Oracle)
2022-01-04  8:22   ` Christoph Hellwig
2022-01-05  6:52   ` John Hubbard
2022-01-06 22:05   ` William Kucharski
2022-01-02 21:57 ` [PATCH 08/17] gup: Add try_grab_folio() Matthew Wilcox (Oracle)
2022-01-04  8:24   ` Christoph Hellwig
2022-01-05  7:06   ` John Hubbard
2022-01-02 21:57 ` [PATCH 09/17] gup: Convert gup_pte_range() to use a folio Matthew Wilcox (Oracle)
2022-01-04  8:25   ` Christoph Hellwig
2022-01-05  7:36   ` John Hubbard
2022-01-05  7:52     ` Matthew Wilcox
2022-01-05  7:57       ` John Hubbard
2022-01-02 21:57 ` [PATCH 10/17] gup: Convert gup_hugepte() " Matthew Wilcox (Oracle)
2022-01-04  8:26   ` Christoph Hellwig
2022-01-05  7:46   ` John Hubbard
2022-01-02 21:57 ` [PATCH 11/17] gup: Convert gup_huge_pmd() " Matthew Wilcox (Oracle)
2022-01-04  8:26   ` Christoph Hellwig
2022-01-05  7:50   ` John Hubbard
2022-01-02 21:57 ` [PATCH 12/17] gup: Convert gup_huge_pud() " Matthew Wilcox (Oracle)
2022-01-05  7:58   ` John Hubbard
2022-01-02 21:57 ` [PATCH 13/17] gup: Convert gup_huge_pgd() " Matthew Wilcox (Oracle)
2022-01-05  7:58   ` John Hubbard
2022-01-02 21:57 ` [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio() Matthew Wilcox (Oracle)
2022-01-04  8:32   ` Christoph Hellwig
2022-01-05  8:17   ` John Hubbard
2022-01-09  4:39     ` Matthew Wilcox
2022-01-09  8:01       ` John Hubbard
2022-01-10 15:22         ` Jan Kara
2022-01-10 15:52           ` Matthew Wilcox
2022-01-10 20:36             ` Jan Kara
2022-01-10 21:10               ` Matthew Wilcox
2022-01-17 12:07                 ` Jan Kara
2022-01-02 21:57 ` [PATCH 15/17] gup: Convert for_each_compound_range() to gup_for_each_folio_range() Matthew Wilcox (Oracle)
2022-01-04  8:35   ` Christoph Hellwig
2022-01-05  8:30   ` John Hubbard
2022-01-02 21:57 ` [PATCH 16/17] mm: Add isolate_lru_folio() Matthew Wilcox (Oracle)
2022-01-04  8:36   ` Christoph Hellwig
2022-01-05  8:44   ` John Hubbard
2022-01-06  0:34     ` Matthew Wilcox
2022-01-02 21:57 ` [PATCH 17/17] gup: Convert check_and_migrate_movable_pages() to use a folio Matthew Wilcox (Oracle)
2022-01-04  8:37   ` Christoph Hellwig
2022-01-05  9:00   ` John Hubbard
2022-01-06 22:12 ` [PATCH 00/17] Convert GUP to folios William Kucharski
2022-01-07 18:54 ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.