All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: free idle swap cache page after COW
@ 2021-06-01  5:31 Huang Ying
  2021-06-01 11:48 ` Matthew Wilcox
  2021-06-01 13:30 ` Johannes Weiner
  0 siblings, 2 replies; 6+ messages in thread
From: Huang Ying @ 2021-06-01  5:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Johannes Weiner,
	Matthew Wilcox, Linus Torvalds, Peter Xu, Hugh Dickins,
	Mel Gorman, Rik van Riel, Andrea Arcangeli, Michal Hocko,
	Dave Hansen, Tim Chen

With commit 09854ba94c6a ("mm: do_wp_page() simplification"), after
COW, the idle swap cache page (neither the page nor the corresponding
swap entry is mapped by any process) will be left in the LRU list,
even if it's in the active list or the head of the inactive list.  So,
the page reclaimer may take quite some overhead to reclaim these
actually unused pages.

To help the page reclaiming, in this patch, after COW, the idle swap
cache page will be tried to be freed.  To avoid to introduce much
overhead to the hot COW code path,

a) there's almost zero overhead for non-swap case via checking
   PageSwapCache() firstly.

b) the page lock is acquired via trylock only.

To test the patch, we used pmbench memory accessing benchmark with
working-set larger than available memory on a 2-socket Intel server
with a NVMe SSD as swap device.  Test results shows that the pmbench
score increases up to 23.8% with the decreased size of swap cache and
swapin throughput.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org> # use free_swap_cache()
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
---
 include/linux/swap.h | 5 +++++
 mm/memory.c          | 2 ++
 mm/swap_state.c      | 2 +-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 032485ee7597..bb4889369a22 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -451,6 +451,7 @@ extern void __delete_from_swap_cache(struct page *page,
 extern void delete_from_swap_cache(struct page *);
 extern void clear_shadow_from_swap_cache(int type, unsigned long begin,
 				unsigned long end);
+extern void free_swap_cache(struct page *);
 extern void free_page_and_swap_cache(struct page *);
 extern void free_pages_and_swap_cache(struct page **, int);
 extern struct page *lookup_swap_cache(swp_entry_t entry,
@@ -560,6 +561,10 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
 #define free_pages_and_swap_cache(pages, nr) \
 	release_pages((pages), (nr));
 
+static inline void free_swap_cache(struct page *page)
+{
+}
+
 static inline void show_swap_cache_info(void)
 {
 }
diff --git a/mm/memory.c b/mm/memory.c
index 2b7ffcbca175..d44425820240 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3104,6 +3104,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 				munlock_vma_page(old_page);
 			unlock_page(old_page);
 		}
+		if (page_copied)
+			free_swap_cache(old_page);
 		put_page(old_page);
 	}
 	return page_copied ? VM_FAULT_WRITE : 0;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index b5a3dc8f47a1..95e391f46468 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -285,7 +285,7 @@ void clear_shadow_from_swap_cache(int type, unsigned long begin,
  * try_to_free_swap() _with_ the lock.
  * 					- Marcelo
  */
-static inline void free_swap_cache(struct page *page)
+void free_swap_cache(struct page *page)
 {
 	if (PageSwapCache(page) && !page_mapped(page) && trylock_page(page)) {
 		try_to_free_swap(page);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: free idle swap cache page after COW
  2021-06-01  5:31 [PATCH] mm: free idle swap cache page after COW Huang Ying
@ 2021-06-01 11:48 ` Matthew Wilcox
  2021-06-01 15:00   ` Johannes Weiner
  2021-06-01 13:30 ` Johannes Weiner
  1 sibling, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2021-06-01 11:48 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Johannes Weiner,
	Linus Torvalds, Peter Xu, Hugh Dickins, Mel Gorman, Rik van Riel,
	Andrea Arcangeli, Michal Hocko, Dave Hansen, Tim Chen

On Tue, Jun 01, 2021 at 01:31:43PM +0800, Huang Ying wrote:
> With commit 09854ba94c6a ("mm: do_wp_page() simplification"), after
> COW, the idle swap cache page (neither the page nor the corresponding
> swap entry is mapped by any process) will be left in the LRU list,
> even if it's in the active list or the head of the inactive list.  So,
> the page reclaimer may take quite some overhead to reclaim these
> actually unused pages.
> 
> To help the page reclaiming, in this patch, after COW, the idle swap
> cache page will be tried to be freed.  To avoid to introduce much
> overhead to the hot COW code path,
> 
> a) there's almost zero overhead for non-swap case via checking
>    PageSwapCache() firstly.
> 
> b) the page lock is acquired via trylock only.
> 
> To test the patch, we used pmbench memory accessing benchmark with
> working-set larger than available memory on a 2-socket Intel server
> with a NVMe SSD as swap device.  Test results shows that the pmbench
> score increases up to 23.8% with the decreased size of swap cache and
> swapin throughput.

So 2 percentage points better than my original idea?  Sweet.

> diff --git a/mm/memory.c b/mm/memory.c
> index 2b7ffcbca175..d44425820240 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3104,6 +3104,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>  				munlock_vma_page(old_page);
>  			unlock_page(old_page);
>  		}
> +		if (page_copied)
> +			free_swap_cache(old_page);
>  		put_page(old_page);
>  	}
>  	return page_copied ? VM_FAULT_WRITE : 0;

Why not ...

		if (page_copied)
			free_page_and_swap_cache(old_page);
		else
			put_page(old_page);

then you don't need to expose free_swap_cache().  Or does the test for
huge_zero_page mess this up?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: free idle swap cache page after COW
  2021-06-01  5:31 [PATCH] mm: free idle swap cache page after COW Huang Ying
  2021-06-01 11:48 ` Matthew Wilcox
@ 2021-06-01 13:30 ` Johannes Weiner
  1 sibling, 0 replies; 6+ messages in thread
From: Johannes Weiner @ 2021-06-01 13:30 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Matthew Wilcox,
	Linus Torvalds, Peter Xu, Hugh Dickins, Mel Gorman, Rik van Riel,
	Andrea Arcangeli, Michal Hocko, Dave Hansen, Tim Chen

On Tue, Jun 01, 2021 at 01:31:43PM +0800, Huang Ying wrote:
> With commit 09854ba94c6a ("mm: do_wp_page() simplification"), after
> COW, the idle swap cache page (neither the page nor the corresponding
> swap entry is mapped by any process) will be left in the LRU list,
> even if it's in the active list or the head of the inactive list.  So,
> the page reclaimer may take quite some overhead to reclaim these
> actually unused pages.
> 
> To help the page reclaiming, in this patch, after COW, the idle swap
> cache page will be tried to be freed.  To avoid to introduce much
> overhead to the hot COW code path,
> 
> a) there's almost zero overhead for non-swap case via checking
>    PageSwapCache() firstly.
> 
> b) the page lock is acquired via trylock only.
> 
> To test the patch, we used pmbench memory accessing benchmark with
> working-set larger than available memory on a 2-socket Intel server
> with a NVMe SSD as swap device.  Test results shows that the pmbench
> score increases up to 23.8% with the decreased size of swap cache and
> swapin throughput.
> 
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> # use free_swap_cache()
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Tim Chen <tim.c.chen@intel.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: free idle swap cache page after COW
  2021-06-01 11:48 ` Matthew Wilcox
@ 2021-06-01 15:00   ` Johannes Weiner
  2021-06-02  3:18       ` Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: Johannes Weiner @ 2021-06-01 15:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Huang Ying, Andrew Morton, linux-mm, linux-kernel,
	Linus Torvalds, Peter Xu, Hugh Dickins, Mel Gorman, Rik van Riel,
	Andrea Arcangeli, Michal Hocko, Dave Hansen, Tim Chen

On Tue, Jun 01, 2021 at 12:48:15PM +0100, Matthew Wilcox wrote:
> On Tue, Jun 01, 2021 at 01:31:43PM +0800, Huang Ying wrote:
> > With commit 09854ba94c6a ("mm: do_wp_page() simplification"), after
> > COW, the idle swap cache page (neither the page nor the corresponding
> > swap entry is mapped by any process) will be left in the LRU list,
> > even if it's in the active list or the head of the inactive list.  So,
> > the page reclaimer may take quite some overhead to reclaim these
> > actually unused pages.
> > 
> > To help the page reclaiming, in this patch, after COW, the idle swap
> > cache page will be tried to be freed.  To avoid to introduce much
> > overhead to the hot COW code path,
> > 
> > a) there's almost zero overhead for non-swap case via checking
> >    PageSwapCache() firstly.
> > 
> > b) the page lock is acquired via trylock only.
> > 
> > To test the patch, we used pmbench memory accessing benchmark with
> > working-set larger than available memory on a 2-socket Intel server
> > with a NVMe SSD as swap device.  Test results shows that the pmbench
> > score increases up to 23.8% with the decreased size of swap cache and
> > swapin throughput.
> 
> So 2 percentage points better than my original idea?  Sweet.
> 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 2b7ffcbca175..d44425820240 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3104,6 +3104,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
> >  				munlock_vma_page(old_page);
> >  			unlock_page(old_page);
> >  		}
> > +		if (page_copied)
> > +			free_swap_cache(old_page);
> >  		put_page(old_page);
> >  	}
> >  	return page_copied ? VM_FAULT_WRITE : 0;
> 
> Why not ...
> 
> 		if (page_copied)
> 			free_page_and_swap_cache(old_page);
> 		else
> 			put_page(old_page);
> 
> then you don't need to expose free_swap_cache().  Or does the test for
> huge_zero_page mess this up?

It's free_page[s]_and_swap_cache() we should reconsider, IMO.

free_swap_cache() makes for a clean API function that does one thing,
and does it right. free_page_and_swap_cache() combines two independent
operations, which has the habit of accumulating special case-handling
for some callers that is unncessary overhead for others (Abstraction
Inversion anti-pattern).

For example, free_page_and_swap_cache() adds an is_huge_zero_page()
check around the put_page() for the tlb batching code. This isn't
needed here. AFAICS it is also unnecessary for the other callsite,
__collapse_huge_page_copy(), where context rules out zero pages.

The common put_page() in Huang's version also makes it slighly easier
to follow the lifetime of old_page.

So I'd say exposing free_swap_cache() is a good move, for this patch
and in general.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: free idle swap cache page after COW
  2021-06-01 15:00   ` Johannes Weiner
@ 2021-06-02  3:18       ` Linus Torvalds
  0 siblings, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2021-06-02  3:18 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Matthew Wilcox, Huang Ying, Andrew Morton, Linux-MM,
	Linux Kernel Mailing List, Peter Xu, Hugh Dickins, Mel Gorman,
	Rik van Riel, Andrea Arcangeli, Michal Hocko, Dave Hansen,
	Tim Chen

On Tue, Jun 1, 2021 at 5:00 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> It's free_page[s]_and_swap_cache() we should reconsider, IMO.
>
> free_swap_cache() makes for a clean API function that does one thing,
> and does it right. free_page_and_swap_cache() combines two independent
> operations, which has the habit of accumulating special case-handling
> for some callers that is unncessary overhead for others (Abstraction
> Inversion anti-pattern).

Agreed. That "free_page_and_swap_cache()" function is odd. Much better
written as

        free_swap_cache();
        free_page();

because there's no real advantage to try to merge the two.

It's not like the merged function can actually do any optimization
based on the merging, it just does the above anyway - except it then
has that extra odd huge_zero_page test that makes no sense
what-so-ever.

So if anything we should try to get rid of uses of that odd
free_page_and_swap_cache() thing. But it's not exactly urgent.

             Linus
             Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: free idle swap cache page after COW
@ 2021-06-02  3:18       ` Linus Torvalds
  0 siblings, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2021-06-02  3:18 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Matthew Wilcox, Huang Ying, Andrew Morton, Linux-MM,
	Linux Kernel Mailing List, Peter Xu, Hugh Dickins, Mel Gorman,
	Rik van Riel, Andrea Arcangeli, Michal Hocko, Dave Hansen,
	Tim Chen

On Tue, Jun 1, 2021 at 5:00 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> It's free_page[s]_and_swap_cache() we should reconsider, IMO.
>
> free_swap_cache() makes for a clean API function that does one thing,
> and does it right. free_page_and_swap_cache() combines two independent
> operations, which has the habit of accumulating special case-handling
> for some callers that is unncessary overhead for others (Abstraction
> Inversion anti-pattern).

Agreed. That "free_page_and_swap_cache()" function is odd. Much better
written as

        free_swap_cache();
        free_page();

because there's no real advantage to try to merge the two.

It's not like the merged function can actually do any optimization
based on the merging, it just does the above anyway - except it then
has that extra odd huge_zero_page test that makes no sense
what-so-ever.

So if anything we should try to get rid of uses of that odd
free_page_and_swap_cache() thing. But it's not exactly urgent.

             Linus
             Linus


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-06-02  3:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-01  5:31 [PATCH] mm: free idle swap cache page after COW Huang Ying
2021-06-01 11:48 ` Matthew Wilcox
2021-06-01 15:00   ` Johannes Weiner
2021-06-02  3:18     ` Linus Torvalds
2021-06-02  3:18       ` Linus Torvalds
2021-06-01 13:30 ` Johannes Weiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.