linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry
@ 2020-08-19 18:48 Matthew Wilcox (Oracle)
  2020-08-19 18:48 ` [PATCH 1/8] mm: Factor find_get_swap_page out of mincore_page Matthew Wilcox (Oracle)
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-19 18:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	Andrew Morton, Hugh Dickins, William Kucharski, Jani Nikula,
	Alexey Dobriyan, Johannes Weiner, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

This patch seris started out as part of the THP patch set, but it has
some nice effects along the way and it seems worth splitting it out and
submitting separately.

Currently find_get_entry() and find_lock_entry() return the page
corresponding to the requested index, but the first thing most callers do
is find the head page, which we just threw away.  As part of auditing
all the callers, I found some misuses of the APIs and some plain
inefficiencies that I've fixed.

The diffstat is unflattering, but I added more kernel-doc.

Matthew Wilcox (Oracle) (8):
  mm: Factor find_get_swap_page out of mincore_page
  mm: Use find_get_swap_page in memcontrol
  mm: Optimise madvise WILLNEED
  proc: Optimise smaps for shmem entries
  i915: Use find_lock_page instead of find_lock_entry
  mm: Convert find_get_entry to return the head page
  mm: Return head page from find_lock_entry
  mm: Hoist find_subpage call further up in pagecache_get_page

 drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  4 +--
 fs/proc/task_mmu.c                        |  8 +----
 include/linux/pagemap.h                   | 16 +++++++--
 include/linux/swap.h                      |  7 ++++
 mm/filemap.c                              | 41 +++++++++++------------
 mm/madvise.c                              | 21 +++++++-----
 mm/memcontrol.c                           | 25 ++------------
 mm/mincore.c                              | 28 ++--------------
 mm/shmem.c                                | 15 +++++----
 mm/swap_state.c                           | 31 +++++++++++++++++
 10 files changed, 98 insertions(+), 98 deletions(-)

-- 
2.28.0



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/8] mm: Factor find_get_swap_page out of mincore_page
  2020-08-19 18:48 [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry Matthew Wilcox (Oracle)
@ 2020-08-19 18:48 ` Matthew Wilcox (Oracle)
  2020-08-19 19:21   ` Matthew Wilcox
  2020-08-19 18:48 ` [PATCH 2/8] mm: Use find_get_swap_page in memcontrol Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-19 18:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	Andrew Morton, Hugh Dickins, William Kucharski, Jani Nikula,
	Alexey Dobriyan, Johannes Weiner, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

Provide this functionality from the swap cache.  It's useful for
more than just mincore().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/swap.h |  7 +++++++
 mm/mincore.c         | 28 ++--------------------------
 mm/swap_state.c      | 31 +++++++++++++++++++++++++++++++
 3 files changed, 40 insertions(+), 26 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 661046994db4..247527ea3e66 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -427,6 +427,7 @@ extern void free_pages_and_swap_cache(struct page **, int);
 extern struct page *lookup_swap_cache(swp_entry_t entry,
 				      struct vm_area_struct *vma,
 				      unsigned long addr);
+struct page *find_get_swap_page(struct address_space *mapping, pgoff_t index);
 extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
 			struct vm_area_struct *vma, unsigned long addr,
 			bool do_poll);
@@ -569,6 +570,12 @@ static inline struct page *lookup_swap_cache(swp_entry_t swp,
 	return NULL;
 }
 
+static inline
+struct page *find_get_swap_page(struct address_space *mapping, pgoff_t index)
+{
+	return find_get_page(mapping, index);
+}
+
 static inline int add_to_swap(struct page *page)
 {
 	return 0;
diff --git a/mm/mincore.c b/mm/mincore.c
index 453ff112470f..2806258f99c6 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -48,7 +48,7 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr,
  * and is up to date; i.e. that no page-in operation would be required
  * at this time if an application were to map and access this page.
  */
-static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
+static unsigned char mincore_page(struct address_space *mapping, pgoff_t index)
 {
 	unsigned char present = 0;
 	struct page *page;
@@ -59,31 +59,7 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
 	 * any other file mapping (ie. marked !present and faulted in with
 	 * tmpfs's .fault). So swapped out tmpfs mappings are tested here.
 	 */
-#ifdef CONFIG_SWAP
-	if (shmem_mapping(mapping)) {
-		page = find_get_entry(mapping, pgoff);
-		/*
-		 * shmem/tmpfs may return swap: account for swapcache
-		 * page too.
-		 */
-		if (xa_is_value(page)) {
-			swp_entry_t swp = radix_to_swp_entry(page);
-			struct swap_info_struct *si;
-
-			/* Prevent swap device to being swapoff under us */
-			si = get_swap_device(swp);
-			if (si) {
-				page = find_get_page(swap_address_space(swp),
-						     swp_offset(swp));
-				put_swap_device(si);
-			} else
-				page = NULL;
-		}
-	} else
-		page = find_get_page(mapping, pgoff);
-#else
-	page = find_get_page(mapping, pgoff);
-#endif
+	page = find_get_swap_page(mapping, index);
 	if (page) {
 		present = PageUptodate(page);
 		put_page(page);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index c16eebb81d8b..986b4e3d3bad 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -414,6 +414,37 @@ struct page *lookup_swap_cache(swp_entry_t entry, struct vm_area_struct *vma,
 	return page;
 }
 
+/**
+ * find_get_swap_page - Find and get a page from the page or swap caches.
+ * @mapping: The address_space to search.
+ * @index: The page cache index.
+ *
+ * This is like find_get_page(), but if we find a swap entry for
+ * shmem/tmpfs, we also look in the swap cache for the page.
+ *
+ * Return: The found page or %NULL.
+ */
+struct page *find_get_swap_page(struct address_space *mapping, pgoff_t index)
+{
+	swp_entry_t swp;
+	struct swap_info_struct *si;
+	struct page *page = find_get_entry(mapping, index);
+
+	if (!xa_is_value(page))
+		return page;
+	if (!shmem_mapping(mapping))
+		return NULL;
+
+	swp = radix_to_swp_entry(page);
+	/* Prevent swapoff from happening to us */
+	si = get_swap_device(swp);
+	if (!si)
+		return NULL;
+	page = find_get_page(swap_address_space(swp), swp_offset(swp));
+	put_swap_device(si);
+	return page;
+}
+
 struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 			struct vm_area_struct *vma, unsigned long addr,
 			bool *new_page_allocated)
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/8] mm: Use find_get_swap_page in memcontrol
  2020-08-19 18:48 [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry Matthew Wilcox (Oracle)
  2020-08-19 18:48 ` [PATCH 1/8] mm: Factor find_get_swap_page out of mincore_page Matthew Wilcox (Oracle)
@ 2020-08-19 18:48 ` Matthew Wilcox (Oracle)
  2020-08-26 14:20   ` Johannes Weiner
  2020-08-19 18:48 ` [PATCH 3/8] mm: Optimise madvise WILLNEED Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-19 18:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	Andrew Morton, Hugh Dickins, William Kucharski, Jani Nikula,
	Alexey Dobriyan, Johannes Weiner, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

The current code does not protect against swapoff of the underlying
swap device, so this is a bug fix as well as a worthwhile reduction in
code complexity.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memcontrol.c | 25 ++-----------------------
 1 file changed, 2 insertions(+), 23 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b807952b4d43..4075f214a841 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5539,35 +5539,14 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
 static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
 			unsigned long addr, pte_t ptent, swp_entry_t *entry)
 {
-	struct page *page = NULL;
-	struct address_space *mapping;
-	pgoff_t pgoff;
-
 	if (!vma->vm_file) /* anonymous vma */
 		return NULL;
 	if (!(mc.flags & MOVE_FILE))
 		return NULL;
 
-	mapping = vma->vm_file->f_mapping;
-	pgoff = linear_page_index(vma, addr);
-
 	/* page is moved even if it's not RSS of this task(page-faulted). */
-#ifdef CONFIG_SWAP
-	/* shmem/tmpfs may report page out on swap: account for that too. */
-	if (shmem_mapping(mapping)) {
-		page = find_get_entry(mapping, pgoff);
-		if (xa_is_value(page)) {
-			swp_entry_t swp = radix_to_swp_entry(page);
-			*entry = swp;
-			page = find_get_page(swap_address_space(swp),
-					     swp_offset(swp));
-		}
-	} else
-		page = find_get_page(mapping, pgoff);
-#else
-	page = find_get_page(mapping, pgoff);
-#endif
-	return page;
+	return find_get_swap_page(vma->vm_file->f_mapping,
+			linear_page_index(vma, addr));
 }
 
 /**
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/8] mm: Optimise madvise WILLNEED
  2020-08-19 18:48 [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry Matthew Wilcox (Oracle)
  2020-08-19 18:48 ` [PATCH 1/8] mm: Factor find_get_swap_page out of mincore_page Matthew Wilcox (Oracle)
  2020-08-19 18:48 ` [PATCH 2/8] mm: Use find_get_swap_page in memcontrol Matthew Wilcox (Oracle)
@ 2020-08-19 18:48 ` Matthew Wilcox (Oracle)
  2020-08-26 14:23   ` Johannes Weiner
  2020-08-19 18:48 ` [PATCH 4/8] proc: Optimise smaps for shmem entries Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-19 18:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	Andrew Morton, Hugh Dickins, William Kucharski, Jani Nikula,
	Alexey Dobriyan, Johannes Weiner, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

Instead of calling find_get_entry() for every page index, use an XArray
iterator to skip over NULL entries, and avoid calling get_page(),
because we only want the swap entries.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/madvise.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index dd1d43cf026d..96189acd6969 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -224,25 +224,28 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma,
 		unsigned long start, unsigned long end,
 		struct address_space *mapping)
 {
-	pgoff_t index;
+	XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start));
+	pgoff_t end_index = end / PAGE_SIZE;
 	struct page *page;
-	swp_entry_t swap;
 
-	for (; start < end; start += PAGE_SIZE) {
-		index = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+	rcu_read_lock();
+	xas_for_each(&xas, page, end_index) {
+		swp_entry_t swap;
 
-		page = find_get_entry(mapping, index);
-		if (!xa_is_value(page)) {
-			if (page)
-				put_page(page);
+		if (!xa_is_value(page))
 			continue;
-		}
+		rcu_read_unlock();
+
 		swap = radix_to_swp_entry(page);
 		page = read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE,
 							NULL, 0, false);
 		if (page)
 			put_page(page);
+
+		rcu_read_lock();
+		xas_reset(&xas);
 	}
+	rcu_read_unlock();
 
 	lru_add_drain();	/* Push any new pages onto the LRU now */
 }
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/8] proc: Optimise smaps for shmem entries
  2020-08-19 18:48 [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2020-08-19 18:48 ` [PATCH 3/8] mm: Optimise madvise WILLNEED Matthew Wilcox (Oracle)
@ 2020-08-19 18:48 ` Matthew Wilcox (Oracle)
  2020-08-26 14:24   ` Johannes Weiner
  2020-08-19 18:48 ` [PATCH 5/8] i915: Use find_lock_page instead of find_lock_entry Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-19 18:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	Andrew Morton, Hugh Dickins, William Kucharski, Jani Nikula,
	Alexey Dobriyan, Johannes Weiner, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

Avoid bumping the refcount on pages when we're only interested in the
swap entries.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/proc/task_mmu.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 5066b0251ed8..e42d9e5e9a3c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -520,16 +520,10 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 			page = device_private_entry_to_page(swpent);
 	} else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap
 							&& pte_none(*pte))) {
-		page = find_get_entry(vma->vm_file->f_mapping,
+		page = xa_load(&vma->vm_file->f_mapping->i_pages,
 						linear_page_index(vma, addr));
-		if (!page)
-			return;
-
 		if (xa_is_value(page))
 			mss->swap += PAGE_SIZE;
-		else
-			put_page(page);
-
 		return;
 	}
 
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/8] i915: Use find_lock_page instead of find_lock_entry
  2020-08-19 18:48 [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2020-08-19 18:48 ` [PATCH 4/8] proc: Optimise smaps for shmem entries Matthew Wilcox (Oracle)
@ 2020-08-19 18:48 ` Matthew Wilcox (Oracle)
  2020-08-26 14:27   ` Johannes Weiner
  2020-08-19 18:48 ` [PATCH 6/8] mm: Convert find_get_entry to return the head page Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-19 18:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	Andrew Morton, Hugh Dickins, William Kucharski, Jani Nikula,
	Alexey Dobriyan, Johannes Weiner, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

i915 does not want to see value entries.  Switch it to use
find_lock_page() instead, and remove the export of find_lock_entry().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 ++--
 mm/filemap.c                              | 1 -
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 38113d3c0138..75e8b71c18b9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -258,8 +258,8 @@ shmem_writeback(struct drm_i915_gem_object *obj)
 	for (i = 0; i < obj->base.size >> PAGE_SHIFT; i++) {
 		struct page *page;
 
-		page = find_lock_entry(mapping, i);
-		if (!page || xa_is_value(page))
+		page = find_lock_page(mapping, i);
+		if (!page)
 			continue;
 
 		if (!page_mapped(page) && clear_page_dirty_for_io(page)) {
diff --git a/mm/filemap.c b/mm/filemap.c
index 1aaea26556cc..36067cf7f8c5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1628,7 +1628,6 @@ struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset)
 	}
 	return page;
 }
-EXPORT_SYMBOL(find_lock_entry);
 
 /**
  * pagecache_get_page - Find and get a reference to a page.
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/8] mm: Convert find_get_entry to return the head page
  2020-08-19 18:48 [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2020-08-19 18:48 ` [PATCH 5/8] i915: Use find_lock_page instead of find_lock_entry Matthew Wilcox (Oracle)
@ 2020-08-19 18:48 ` Matthew Wilcox (Oracle)
  2020-08-26 15:09   ` Johannes Weiner
  2020-08-19 18:48 ` [PATCH 7/8] mm: Return head page from find_lock_entry Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-19 18:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	Andrew Morton, Hugh Dickins, William Kucharski, Jani Nikula,
	Alexey Dobriyan, Johannes Weiner, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

There are only three callers remaining of find_get_entry().
find_get_swap_page() is happy to get the head page instead of the subpage.
Add find_subpage() calls to find_lock_entry() and pagecache_get_page()
to avoid auditing all their callers.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h |  4 ++--
 mm/filemap.c            | 13 +++++++------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 7de11dcd534d..8261c64f592d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -384,8 +384,8 @@ static inline struct page *find_subpage(struct page *head, pgoff_t index)
 	return head + (index & (thp_nr_pages(head) - 1));
 }
 
-struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
-struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset);
+struct page *find_get_entry(struct address_space *mapping, pgoff_t index);
+struct page *find_lock_entry(struct address_space *mapping, pgoff_t index);
 unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
 			  unsigned int nr_entries, struct page **entries,
 			  pgoff_t *indices);
diff --git a/mm/filemap.c b/mm/filemap.c
index 36067cf7f8c5..5a87e1bdddd6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1547,19 +1547,19 @@ EXPORT_SYMBOL(page_cache_prev_miss);
 /**
  * find_get_entry - find and get a page cache entry
  * @mapping: the address_space to search
- * @offset: the page cache index
+ * @index: The page cache index.
  *
  * Looks up the page cache slot at @mapping & @offset.  If there is a
- * page cache page, it is returned with an increased refcount.
+ * page cache page, the head page is returned with an increased refcount.
  *
  * If the slot holds a shadow entry of a previously evicted page, or a
  * swap entry from shmem/tmpfs, it is returned.
  *
- * Return: the found page or shadow entry, %NULL if nothing is found.
+ * Return: The head page or shadow entry, %NULL if nothing is found.
  */
-struct page *find_get_entry(struct address_space *mapping, pgoff_t offset)
+struct page *find_get_entry(struct address_space *mapping, pgoff_t index)
 {
-	XA_STATE(xas, &mapping->i_pages, offset);
+	XA_STATE(xas, &mapping->i_pages, index);
 	struct page *page;
 
 	rcu_read_lock();
@@ -1587,7 +1587,6 @@ struct page *find_get_entry(struct address_space *mapping, pgoff_t offset)
 		put_page(page);
 		goto repeat;
 	}
-	page = find_subpage(page, offset);
 out:
 	rcu_read_unlock();
 
@@ -1624,6 +1623,7 @@ struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset)
 			put_page(page);
 			goto repeat;
 		}
+		page = find_subpage(page, offset);
 		VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);
 	}
 	return page;
@@ -1670,6 +1670,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
 		page = NULL;
 	if (!page)
 		goto no_page;
+	page = find_subpage(page, index);
 
 	if (fgp_flags & FGP_LOCK) {
 		if (fgp_flags & FGP_NOWAIT) {
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/8] mm: Return head page from find_lock_entry
  2020-08-19 18:48 [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2020-08-19 18:48 ` [PATCH 6/8] mm: Convert find_get_entry to return the head page Matthew Wilcox (Oracle)
@ 2020-08-19 18:48 ` Matthew Wilcox (Oracle)
  2020-08-19 18:48 ` [PATCH 8/8] mm: Hoist find_subpage call further up in pagecache_get_page Matthew Wilcox (Oracle)
  2020-08-21 17:37 ` [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry William Kucharski
  8 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-19 18:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	Andrew Morton, Hugh Dickins, William Kucharski, Jani Nikula,
	Alexey Dobriyan, Johannes Weiner, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

Convert the one caller of find_lock_entry() to cope with a head page
being returned instead of the subpage for the index.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 12 ++++++++++++
 mm/filemap.c            | 25 +++++++++++--------------
 mm/shmem.c              | 15 ++++++++-------
 3 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 8261c64f592d..de8f3758ceaa 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -371,6 +371,18 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
 			mapping_gfp_mask(mapping));
 }
 
+/*
+ * Is this index for one of the subpages of this page?
+ * This should always be called for a locked, non-hugetlbfs page.
+ */
+static inline bool thp_valid_index(struct page *head, pgoff_t index)
+{
+	VM_BUG_ON_PAGE(PageHuge(head), head);
+	VM_BUG_ON_PAGE(!PageLocked(head), head);
+
+	return head->index == (index & ~(thp_nr_pages(head) - 1));
+}
+
 /*
  * Given the page we found in the page cache, return the page corresponding
  * to this index in the file
diff --git a/mm/filemap.c b/mm/filemap.c
index 5a87e1bdddd6..6594baae7cd2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1594,37 +1594,34 @@ struct page *find_get_entry(struct address_space *mapping, pgoff_t index)
 }
 
 /**
- * find_lock_entry - locate, pin and lock a page cache entry
- * @mapping: the address_space to search
- * @offset: the page cache index
+ * find_lock_entry - Locate and lock a page cache entry.
+ * @mapping: The address_space to search.
+ * @index: The page cache index.
  *
- * Looks up the page cache slot at @mapping & @offset.  If there is a
- * page cache page, it is returned locked and with an increased
- * refcount.
+ * Looks up the page at @mapping & @index.  If there is a page in the
+ * cache, the head page is returned locked and with an increased refcount.
  *
  * If the slot holds a shadow entry of a previously evicted page, or a
  * swap entry from shmem/tmpfs, it is returned.
  *
- * find_lock_entry() may sleep.
- *
- * Return: the found page or shadow entry, %NULL if nothing is found.
+ * Context: May sleep.
+ * Return: The head page or shadow entry, %NULL if nothing is found.
  */
-struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset)
+struct page *find_lock_entry(struct address_space *mapping, pgoff_t index)
 {
 	struct page *page;
 
 repeat:
-	page = find_get_entry(mapping, offset);
+	page = find_get_entry(mapping, index);
 	if (page && !xa_is_value(page)) {
 		lock_page(page);
 		/* Has the page been truncated? */
-		if (unlikely(page_mapping(page) != mapping)) {
+		if (unlikely(page->mapping != mapping)) {
 			unlock_page(page);
 			put_page(page);
 			goto repeat;
 		}
-		page = find_subpage(page, offset);
-		VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);
+		VM_BUG_ON_PAGE(!thp_valid_index(page, index), page);
 	}
 	return page;
 }
diff --git a/mm/shmem.c b/mm/shmem.c
index 271548ca20f3..bbb43e9d35df 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1834,8 +1834,9 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 		page = NULL;
 	}
 	if (page || sgp == SGP_READ) {
-		*pagep = page;
-		return 0;
+		if (page && PageTransHuge(page))
+			hindex = round_down(index, HPAGE_PMD_NR);
+		goto out;
 	}
 
 	/*
@@ -1961,14 +1962,13 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 	 * it now, lest undo on failure cancel our earlier guarantee.
 	 */
 	if (sgp != SGP_WRITE && !PageUptodate(page)) {
-		struct page *head = compound_head(page);
 		int i;
 
-		for (i = 0; i < compound_nr(head); i++) {
-			clear_highpage(head + i);
-			flush_dcache_page(head + i);
+		for (i = 0; i < compound_nr(page); i++) {
+			clear_highpage(page + i);
+			flush_dcache_page(page + i);
 		}
-		SetPageUptodate(head);
+		SetPageUptodate(page);
 	}
 
 	/* Perhaps the file has been truncated since we checked */
@@ -1984,6 +1984,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 		error = -EINVAL;
 		goto unlock;
 	}
+out:
 	*pagep = page + index - hindex;
 	return 0;
 
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 8/8] mm: Hoist find_subpage call further up in pagecache_get_page
  2020-08-19 18:48 [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2020-08-19 18:48 ` [PATCH 7/8] mm: Return head page from find_lock_entry Matthew Wilcox (Oracle)
@ 2020-08-19 18:48 ` Matthew Wilcox (Oracle)
  2020-08-21 17:37 ` [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry William Kucharski
  8 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-19 18:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	Andrew Morton, Hugh Dickins, William Kucharski, Jani Nikula,
	Alexey Dobriyan, Johannes Weiner, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

This avoids a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 6594baae7cd2..8c354277108d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1667,7 +1667,6 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
 		page = NULL;
 	if (!page)
 		goto no_page;
-	page = find_subpage(page, index);
 
 	if (fgp_flags & FGP_LOCK) {
 		if (fgp_flags & FGP_NOWAIT) {
@@ -1680,12 +1679,12 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
 		}
 
 		/* Has the page been truncated? */
-		if (unlikely(compound_head(page)->mapping != mapping)) {
+		if (unlikely(page->mapping != mapping)) {
 			unlock_page(page);
 			put_page(page);
 			goto repeat;
 		}
-		VM_BUG_ON_PAGE(page->index != index, page);
+		VM_BUG_ON_PAGE(!thp_valid_index(page, index), page);
 	}
 
 	if (fgp_flags & FGP_ACCESSED)
@@ -1695,6 +1694,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
 		if (page_is_idle(page))
 			clear_page_idle(page);
 	}
+	page = find_subpage(page, index);
 
 no_page:
 	if (!page && (fgp_flags & FGP_CREAT)) {
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] mm: Factor find_get_swap_page out of mincore_page
  2020-08-19 18:48 ` [PATCH 1/8] mm: Factor find_get_swap_page out of mincore_page Matthew Wilcox (Oracle)
@ 2020-08-19 19:21   ` Matthew Wilcox
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2020-08-19 19:21 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Hugh Dickins, William Kucharski, Jani Nikula,
	Alexey Dobriyan, Johannes Weiner, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

On Wed, Aug 19, 2020 at 07:48:43PM +0100, Matthew Wilcox (Oracle) wrote:
> Provide this functionality from the swap cache.  It's useful for
> more than just mincore().

The build bot showed I was missing this for some configs:

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 986b4e3d3bad..92a1f40be589 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -21,6 +21,7 @@
 #include <linux/vmalloc.h>
 #include <linux/swap_slots.h>
 #include <linux/huge_mm.h>
+#include <linux/shmem_fs.h>
 #include "internal.h"
 
 /*

I'll wait for more feedback before reposting the series in ~24 hours.


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry
  2020-08-19 18:48 [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2020-08-19 18:48 ` [PATCH 8/8] mm: Hoist find_subpage call further up in pagecache_get_page Matthew Wilcox (Oracle)
@ 2020-08-21 17:37 ` William Kucharski
  8 siblings, 0 replies; 21+ messages in thread
From: William Kucharski @ 2020-08-21 17:37 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Hugh Dickins, Jani Nikula,
	Alexey Dobriyan, Johannes Weiner, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel



> On Aug 19, 2020, at 12:48 PM, Matthew Wilcox (Oracle) <willy@infradead.org> wrote:
> 
> This patch seris started out as part of the THP patch set, but it has
> some nice effects along the way and it seems worth splitting it out and
> submitting separately.
> 
> Currently find_get_entry() and find_lock_entry() return the page
> corresponding to the requested index, but the first thing most callers do
> is find the head page, which we just threw away.  As part of auditing
> all the callers, I found some misuses of the APIs and some plain
> inefficiencies that I've fixed.
> 
> The diffstat is unflattering, but I added more kernel-doc.
> 
> Matthew Wilcox (Oracle) (8):
>  mm: Factor find_get_swap_page out of mincore_page
>  mm: Use find_get_swap_page in memcontrol
>  mm: Optimise madvise WILLNEED
>  proc: Optimise smaps for shmem entries
>  i915: Use find_lock_page instead of find_lock_entry
>  mm: Convert find_get_entry to return the head page
>  mm: Return head page from find_lock_entry
>  mm: Hoist find_subpage call further up in pagecache_get_page
> 
> drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  4 +--
> fs/proc/task_mmu.c                        |  8 +----
> include/linux/pagemap.h                   | 16 +++++++--
> include/linux/swap.h                      |  7 ++++
> mm/filemap.c                              | 41 +++++++++++------------
> mm/madvise.c                              | 21 +++++++-----
> mm/memcontrol.c                           | 25 ++------------
> mm/mincore.c                              | 28 ++--------------
> mm/shmem.c                                | 15 +++++----
> mm/swap_state.c                           | 31 +++++++++++++++++
> 10 files changed, 98 insertions(+), 98 deletions(-)

For the series:

Reviewed-by: William Kucharski <william.kucharski@oracle.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/8] mm: Use find_get_swap_page in memcontrol
  2020-08-19 18:48 ` [PATCH 2/8] mm: Use find_get_swap_page in memcontrol Matthew Wilcox (Oracle)
@ 2020-08-26 14:20   ` Johannes Weiner
  2020-08-26 14:54     ` Matthew Wilcox
  2020-08-27 12:59     ` Matthew Wilcox
  0 siblings, 2 replies; 21+ messages in thread
From: Johannes Weiner @ 2020-08-26 14:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Hugh Dickins, William Kucharski,
	Jani Nikula, Alexey Dobriyan, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

On Wed, Aug 19, 2020 at 07:48:44PM +0100, Matthew Wilcox (Oracle) wrote:
> The current code does not protect against swapoff of the underlying
> swap device, so this is a bug fix as well as a worthwhile reduction in
> code complexity.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/memcontrol.c | 25 ++-----------------------
>  1 file changed, 2 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b807952b4d43..4075f214a841 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5539,35 +5539,14 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
>  static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
>  			unsigned long addr, pte_t ptent, swp_entry_t *entry)
>  {
> -	struct page *page = NULL;
> -	struct address_space *mapping;
> -	pgoff_t pgoff;
> -
>  	if (!vma->vm_file) /* anonymous vma */
>  		return NULL;
>  	if (!(mc.flags & MOVE_FILE))
>  		return NULL;
>  
> -	mapping = vma->vm_file->f_mapping;
> -	pgoff = linear_page_index(vma, addr);
> -
>  	/* page is moved even if it's not RSS of this task(page-faulted). */
> -#ifdef CONFIG_SWAP
> -	/* shmem/tmpfs may report page out on swap: account for that too. */
> -	if (shmem_mapping(mapping)) {
> -		page = find_get_entry(mapping, pgoff);
> -		if (xa_is_value(page)) {
> -			swp_entry_t swp = radix_to_swp_entry(page);
> -			*entry = swp;
> -			page = find_get_page(swap_address_space(swp),
> -					     swp_offset(swp));
> -		}
> -	} else
> -		page = find_get_page(mapping, pgoff);
> -#else
> -	page = find_get_page(mapping, pgoff);
> -#endif
> -	return page;
> +	return find_get_swap_page(vma->vm_file->f_mapping,
> +			linear_page_index(vma, addr));

The refactor makes sense to me, but the name is confusing. We're not
looking for a swap page, we're primarily looking for a file page in
the page cache mapping that's handed in. Only in the special case
where it's a shmem mapping and there is a swap entry do we consult the
auxiliary swap cache.

How about find_get_page_or_swapcache()? find_get_page_shmemswap()?
Maybe you have a better idea. It's a fairly specialized operation that
isn't widely used, so a longer name isn't a bad thing IMO.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/8] mm: Optimise madvise WILLNEED
  2020-08-19 18:48 ` [PATCH 3/8] mm: Optimise madvise WILLNEED Matthew Wilcox (Oracle)
@ 2020-08-26 14:23   ` Johannes Weiner
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Weiner @ 2020-08-26 14:23 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Hugh Dickins, William Kucharski,
	Jani Nikula, Alexey Dobriyan, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

On Wed, Aug 19, 2020 at 07:48:45PM +0100, Matthew Wilcox (Oracle) wrote:
> Instead of calling find_get_entry() for every page index, use an XArray
> iterator to skip over NULL entries, and avoid calling get_page(),
> because we only want the swap entries.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/8] proc: Optimise smaps for shmem entries
  2020-08-19 18:48 ` [PATCH 4/8] proc: Optimise smaps for shmem entries Matthew Wilcox (Oracle)
@ 2020-08-26 14:24   ` Johannes Weiner
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Weiner @ 2020-08-26 14:24 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Hugh Dickins, William Kucharski,
	Jani Nikula, Alexey Dobriyan, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

On Wed, Aug 19, 2020 at 07:48:46PM +0100, Matthew Wilcox (Oracle) wrote:
> Avoid bumping the refcount on pages when we're only interested in the
> swap entries.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/8] i915: Use find_lock_page instead of find_lock_entry
  2020-08-19 18:48 ` [PATCH 5/8] i915: Use find_lock_page instead of find_lock_entry Matthew Wilcox (Oracle)
@ 2020-08-26 14:27   ` Johannes Weiner
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Weiner @ 2020-08-26 14:27 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Hugh Dickins, William Kucharski,
	Jani Nikula, Alexey Dobriyan, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

On Wed, Aug 19, 2020 at 07:48:47PM +0100, Matthew Wilcox (Oracle) wrote:
> i915 does not want to see value entries.  Switch it to use
> find_lock_page() instead, and remove the export of find_lock_entry().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/8] mm: Use find_get_swap_page in memcontrol
  2020-08-26 14:20   ` Johannes Weiner
@ 2020-08-26 14:54     ` Matthew Wilcox
  2020-08-26 16:26       ` Johannes Weiner
  2020-08-27 12:59     ` Matthew Wilcox
  1 sibling, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2020-08-26 14:54 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, Andrew Morton, Hugh Dickins, William Kucharski,
	Jani Nikula, Alexey Dobriyan, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

On Wed, Aug 26, 2020 at 10:20:02AM -0400, Johannes Weiner wrote:
> On Wed, Aug 19, 2020 at 07:48:44PM +0100, Matthew Wilcox (Oracle) wrote:
> > +	return find_get_swap_page(vma->vm_file->f_mapping,
> > +			linear_page_index(vma, addr));
> 
> The refactor makes sense to me, but the name is confusing. We're not
> looking for a swap page, we're primarily looking for a file page in
> the page cache mapping that's handed in. Only in the special case
> where it's a shmem mapping and there is a swap entry do we consult the
> auxiliary swap cache.
> 
> How about find_get_page_or_swapcache()? find_get_page_shmemswap()?
> Maybe you have a better idea. It's a fairly specialized operation that
> isn't widely used, so a longer name isn't a bad thing IMO.

Yeah, I had trouble with the naming here too.

get_page_even_from_swap()
find_get_shmem_page()

or maybe refactor the whole thing:

	struct page *page = find_get_entry(mapping, index);
	page = find_swap_page(mapping, page);

struct page *find_swap_page(struct address_space *mapping, struct page *page)
{
	swp_entry_t swp;
	struct swap_info_struct *si;

	if (!xa_is_value(page))
		return page;
	if (!shmem_mapping(mapping))
		return NULL;

	...
}


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/8] mm: Convert find_get_entry to return the head page
  2020-08-19 18:48 ` [PATCH 6/8] mm: Convert find_get_entry to return the head page Matthew Wilcox (Oracle)
@ 2020-08-26 15:09   ` Johannes Weiner
  2020-08-26 15:48     ` Matthew Wilcox
  0 siblings, 1 reply; 21+ messages in thread
From: Johannes Weiner @ 2020-08-26 15:09 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, Andrew Morton, Hugh Dickins, William Kucharski,
	Jani Nikula, Alexey Dobriyan, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

On Wed, Aug 19, 2020 at 07:48:48PM +0100, Matthew Wilcox (Oracle) wrote:
> There are only three callers remaining of find_get_entry().
> find_get_swap_page() is happy to get the head page instead of the subpage.
> Add find_subpage() calls to find_lock_entry() and pagecache_get_page()
> to avoid auditing all their callers.

I believe this would cause a subtle bug in memcg charge moving for pte
mapped huge pages. We currently skip over tail pages in the range
(they don't have page->mem_cgroup set) and account for the huge page
once from the headpage. After this change, we would see the headpage
and account for it 512 times (or whatever the number is on non-x86).

But that aside, I don't quite understand the intent.

Before, all these functions simply return the base page at @index,
whether it's a regular page or a tail page.

Afterwards, find_lock_entry(), find_get_page() et al still do, but
find_get_entry() returns headpage at @index & HPAGE_CACHE_INDEX_MASK.

Shouldn't we be consistent about how we handle huge pages when
somebody queries the tree for a given base page index?

[ Wouldn't that mean that e.g. find_get_swap_page() would return tail
  pages for regular files and head pages for shmem files? ]


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/8] mm: Convert find_get_entry to return the head page
  2020-08-26 15:09   ` Johannes Weiner
@ 2020-08-26 15:48     ` Matthew Wilcox
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2020-08-26 15:48 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, Andrew Morton, Hugh Dickins, William Kucharski,
	Jani Nikula, Alexey Dobriyan, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

On Wed, Aug 26, 2020 at 11:09:25AM -0400, Johannes Weiner wrote:
> On Wed, Aug 19, 2020 at 07:48:48PM +0100, Matthew Wilcox (Oracle) wrote:
> > There are only three callers remaining of find_get_entry().
> > find_get_swap_page() is happy to get the head page instead of the subpage.
> > Add find_subpage() calls to find_lock_entry() and pagecache_get_page()
> > to avoid auditing all their callers.
> 
> I believe this would cause a subtle bug in memcg charge moving for pte
> mapped huge pages. We currently skip over tail pages in the range
> (they don't have page->mem_cgroup set) and account for the huge page
> once from the headpage. After this change, we would see the headpage
> and account for it 512 times (or whatever the number is on non-x86).

Hmm ... so if you have the last 511 pages of a huge page mapped, you
actually don't charge for it at all today?

I think you're right that I'd introduce this bug, and so that needs to
be fixed.

> But that aside, I don't quite understand the intent.
> 
> Before, all these functions simply return the base page at @index,
> whether it's a regular page or a tail page.
> 
> Afterwards, find_lock_entry(), find_get_page() et al still do, but
> find_get_entry() returns headpage at @index & HPAGE_CACHE_INDEX_MASK.
> 
> Shouldn't we be consistent about how we handle huge pages when
> somebody queries the tree for a given base page index?
> 
> [ Wouldn't that mean that e.g. find_get_swap_page() would return tail
>   pages for regular files and head pages for shmem files? ]

What I'd _like_ to do is convert all the callers to cope with tail
pages never being returned from all the find_* functions.  That seems
like a lot of disruption.

My intent in this series is to get all the find_*_entr{y,ies}
functions to the point where they don't return tail pages.
Also find_get_pages_tag() because tags are only set on head pages.

This is generally what the callers want anyway.  There's even a hack
in find_get_entries() in current to terminate early on finding a THP
(see commit 71725ed10c40696dc6bdccf8e225815dcef24dba).  If I want
to remove that, I need to do _something_ to not put all the subpages
of a THP into the pagevec.

So the new rule will be that find_*_entry() don't return tail pages but
find_*_page() do.  With the full THP patchset in place, THPs become quite
common, so bugs in this area will surface quickly instead of lingering
for years and only popping out in rare circumstances.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/8] mm: Use find_get_swap_page in memcontrol
  2020-08-26 14:54     ` Matthew Wilcox
@ 2020-08-26 16:26       ` Johannes Weiner
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Weiner @ 2020-08-26 16:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Andrew Morton, Hugh Dickins, William Kucharski,
	Jani Nikula, Alexey Dobriyan, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

On Wed, Aug 26, 2020 at 03:54:14PM +0100, Matthew Wilcox wrote:
> On Wed, Aug 26, 2020 at 10:20:02AM -0400, Johannes Weiner wrote:
> > On Wed, Aug 19, 2020 at 07:48:44PM +0100, Matthew Wilcox (Oracle) wrote:
> > > +	return find_get_swap_page(vma->vm_file->f_mapping,
> > > +			linear_page_index(vma, addr));
> > 
> > The refactor makes sense to me, but the name is confusing. We're not
> > looking for a swap page, we're primarily looking for a file page in
> > the page cache mapping that's handed in. Only in the special case
> > where it's a shmem mapping and there is a swap entry do we consult the
> > auxiliary swap cache.
> > 
> > How about find_get_page_or_swapcache()? find_get_page_shmemswap()?
> > Maybe you have a better idea. It's a fairly specialized operation that
> > isn't widely used, so a longer name isn't a bad thing IMO.
> 
> Yeah, I had trouble with the naming here too.
> 
> get_page_even_from_swap()
> find_get_shmem_page()
> 
> or maybe refactor the whole thing:
> 
> 	struct page *page = find_get_entry(mapping, index);
> 	page = find_swap_page(mapping, page);
> 
> struct page *find_swap_page(struct address_space *mapping, struct page *page)
> {
> 	swp_entry_t swp;
> 	struct swap_info_struct *si;
> 
> 	if (!xa_is_value(page))
> 		return page;
> 	if (!shmem_mapping(mapping))
> 		return NULL;
> 
> 	...
> }

Yeah, I like the idea of two lookups if we can't find a good name for
the operation that combines them. I'd just bubble the control flow
that links them up to the callsite - that still seems plenty compact
for two callsites, and keeps all the shmem magic in shmem code:

	page = find_get_entry(mapping, index);
	if (xa_is_value(page))
		if (shmem_mapping(mapping))
			page = lookup_shmem_swap_cache(page);
		else
			page = NULL;

So close to making radix_to_swp_entry() & co. private to shmem.c, too
- if it weren't for force_shm_swapin_readahead(). Ah well.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/8] mm: Use find_get_swap_page in memcontrol
  2020-08-26 14:20   ` Johannes Weiner
  2020-08-26 14:54     ` Matthew Wilcox
@ 2020-08-27 12:59     ` Matthew Wilcox
  2020-08-27 14:22       ` Johannes Weiner
  1 sibling, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2020-08-27 12:59 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, Andrew Morton, Hugh Dickins, William Kucharski,
	Jani Nikula, Alexey Dobriyan, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

On Wed, Aug 26, 2020 at 10:20:02AM -0400, Johannes Weiner wrote:
> The refactor makes sense to me, but the name is confusing. We're not
> looking for a swap page, we're primarily looking for a file page in
> the page cache mapping that's handed in. Only in the special case
> where it's a shmem mapping and there is a swap entry do we consult the
> auxiliary swap cache.
> 
> How about find_get_page_or_swapcache()? find_get_page_shmemswap()?
> Maybe you have a better idea. It's a fairly specialized operation that
> isn't widely used, so a longer name isn't a bad thing IMO.

Got it.  find_get_incore_page().  I was going to go with inmem, but that
it matches mincore sold me on it.

/**
 * find_get_incore_page - Find and get a page from the page or swap caches.
 * @mapping: The address_space to search.
 * @index: The page cache index.
 *
 * This differs from find_get_page() in that it will also look for the
 * page in the swap cache.
 *
 * Return: The found page or %NULL.
 */

I was focusing too much on what the function did, not why it was doing it.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/8] mm: Use find_get_swap_page in memcontrol
  2020-08-27 12:59     ` Matthew Wilcox
@ 2020-08-27 14:22       ` Johannes Weiner
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Weiner @ 2020-08-27 14:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Andrew Morton, Hugh Dickins, William Kucharski,
	Jani Nikula, Alexey Dobriyan, Chris Wilson, Matthew Auld,
	Huang Ying, intel-gfx, cgroups, linux-kernel

On Thu, Aug 27, 2020 at 01:59:41PM +0100, Matthew Wilcox wrote:
> On Wed, Aug 26, 2020 at 10:20:02AM -0400, Johannes Weiner wrote:
> > The refactor makes sense to me, but the name is confusing. We're not
> > looking for a swap page, we're primarily looking for a file page in
> > the page cache mapping that's handed in. Only in the special case
> > where it's a shmem mapping and there is a swap entry do we consult the
> > auxiliary swap cache.
> > 
> > How about find_get_page_or_swapcache()? find_get_page_shmemswap()?
> > Maybe you have a better idea. It's a fairly specialized operation that
> > isn't widely used, so a longer name isn't a bad thing IMO.
> 
> Got it.  find_get_incore_page().  I was going to go with inmem, but that
> it matches mincore sold me on it.
>
> /**
>  * find_get_incore_page - Find and get a page from the page or swap caches.
>  * @mapping: The address_space to search.
>  * @index: The page cache index.
>  *
>  * This differs from find_get_page() in that it will also look for the
>  * page in the swap cache.
>  *
>  * Return: The found page or %NULL.
>  */

Nice work, that's perfect.

> I was focusing too much on what the function did, not why it was doing it.

Me too.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-08-27 14:23 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-19 18:48 [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry Matthew Wilcox (Oracle)
2020-08-19 18:48 ` [PATCH 1/8] mm: Factor find_get_swap_page out of mincore_page Matthew Wilcox (Oracle)
2020-08-19 19:21   ` Matthew Wilcox
2020-08-19 18:48 ` [PATCH 2/8] mm: Use find_get_swap_page in memcontrol Matthew Wilcox (Oracle)
2020-08-26 14:20   ` Johannes Weiner
2020-08-26 14:54     ` Matthew Wilcox
2020-08-26 16:26       ` Johannes Weiner
2020-08-27 12:59     ` Matthew Wilcox
2020-08-27 14:22       ` Johannes Weiner
2020-08-19 18:48 ` [PATCH 3/8] mm: Optimise madvise WILLNEED Matthew Wilcox (Oracle)
2020-08-26 14:23   ` Johannes Weiner
2020-08-19 18:48 ` [PATCH 4/8] proc: Optimise smaps for shmem entries Matthew Wilcox (Oracle)
2020-08-26 14:24   ` Johannes Weiner
2020-08-19 18:48 ` [PATCH 5/8] i915: Use find_lock_page instead of find_lock_entry Matthew Wilcox (Oracle)
2020-08-26 14:27   ` Johannes Weiner
2020-08-19 18:48 ` [PATCH 6/8] mm: Convert find_get_entry to return the head page Matthew Wilcox (Oracle)
2020-08-26 15:09   ` Johannes Weiner
2020-08-26 15:48     ` Matthew Wilcox
2020-08-19 18:48 ` [PATCH 7/8] mm: Return head page from find_lock_entry Matthew Wilcox (Oracle)
2020-08-19 18:48 ` [PATCH 8/8] mm: Hoist find_subpage call further up in pagecache_get_page Matthew Wilcox (Oracle)
2020-08-21 17:37 ` [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry William Kucharski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).