linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: move idle swap cache pages to the tail of LRU after COW
@ 2021-05-19  1:33 Huang Ying
  2021-05-19  2:12 ` Rik van Riel
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Huang Ying @ 2021-05-19  1:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Matthew Wilcox,
	Linus Torvalds, Peter Xu, Hugh Dickins, Johannes Weiner,
	Mel Gorman, Rik van Riel, Andrea Arcangeli, Michal Hocko,
	Dave Hansen, Tim Chen

With commit 09854ba94c6a ("mm: do_wp_page() simplification"), after
COW, the idle swap cache (neither the page nor the corresponding swap
entry is mapped by any process) will be left at the original position
in the LRU list.  While it may be in the active list or the head of
the inactive list, so that vmscan may take more overhead or time to
reclaim these actually unused pages.

To help the page reclaiming, in this patch, after COW, the idle swap
cache will be tried to be moved to the tail of the inactive LRU list.
To avoid to introduce much overhead to the hot COW code path, all
locks are acquired with try locking.

To test the patch, we used pmbench memory accessing benchmark with
working-set larger than available memory on a 2-socket Intel server
with a NVMe SSD as swap device.  Test results shows that the pmbench
score increases up to 21.8% with the decreased size of swap cache and
swapin throughput.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
---
 include/linux/memcontrol.h | 10 ++++++++++
 include/linux/swap.h       |  3 +++
 mm/memcontrol.c            | 12 ++++++++++++
 mm/memory.c                |  5 +++++
 mm/swapfile.c              | 29 +++++++++++++++++++++++++++++
 5 files changed, 59 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0ce97eff79e2..68956db13772 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -761,6 +761,7 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm);
 
 struct lruvec *lock_page_lruvec(struct page *page);
 struct lruvec *lock_page_lruvec_irq(struct page *page);
+struct lruvec *trylock_page_lruvec_irq(struct page *page);
 struct lruvec *lock_page_lruvec_irqsave(struct page *page,
 						unsigned long *flags);
 
@@ -1251,6 +1252,15 @@ static inline struct lruvec *lock_page_lruvec_irq(struct page *page)
 	return &pgdat->__lruvec;
 }
 
+static inline struct lruvec *trylock_page_lruvec_irq(struct page *page)
+{
+	struct pglist_data *pgdat = page_pgdat(page);
+
+	if (spin_trylock_irq(&pgdat->__lruvec.lru_lock))
+		return &pgdat->__lruvec;
+	return NULL;
+}
+
 static inline struct lruvec *lock_page_lruvec_irqsave(struct page *page,
 		unsigned long *flagsp)
 {
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 46d51d058d05..d344b0fa7925 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -504,6 +504,7 @@ extern struct swap_info_struct *page_swap_info(struct page *);
 extern struct swap_info_struct *swp_swap_info(swp_entry_t entry);
 extern bool reuse_swap_page(struct page *, int *);
 extern int try_to_free_swap(struct page *);
+extern void try_to_free_idle_swapcache(struct page *page);
 struct backing_dev_info;
 extern int init_swap_address_space(unsigned int type, unsigned long nr_pages);
 extern void exit_swap_address_space(unsigned int type);
@@ -668,6 +669,8 @@ static inline int try_to_free_swap(struct page *page)
 	return 0;
 }
 
+static inline void try_to_free_idle_swapcache(struct page *page) {}
+
 static inline swp_entry_t get_swap_page(struct page *page)
 {
 	swp_entry_t entry;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index db29b96f7311..e3e813bfebe2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1213,6 +1213,18 @@ struct lruvec *lock_page_lruvec_irq(struct page *page)
 	return lruvec;
 }
 
+struct lruvec *trylock_page_lruvec_irq(struct page *page)
+{
+	struct lruvec *lruvec;
+
+	lruvec = mem_cgroup_page_lruvec(page);
+	if (spin_trylock_irq(&lruvec->lru_lock)) {
+		lruvec_memcg_debug(lruvec, page);
+		return lruvec;
+	}
+	return NULL;
+}
+
 struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flags)
 {
 	struct lruvec *lruvec;
diff --git a/mm/memory.c b/mm/memory.c
index b83f734c4e1d..2b6847f4c03e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3012,6 +3012,11 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 				munlock_vma_page(old_page);
 			unlock_page(old_page);
 		}
+		if (page_copied && PageSwapCache(old_page) &&
+		    !page_mapped(old_page) && trylock_page(old_page)) {
+			try_to_free_idle_swapcache(old_page);
+			unlock_page(old_page);
+		}
 		put_page(old_page);
 	}
 	return page_copied ? VM_FAULT_WRITE : 0;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 2aad85751991..e0dd8937de4e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -40,6 +40,7 @@
 #include <linux/swap_slots.h>
 #include <linux/sort.h>
 #include <linux/completion.h>
+#include <linux/mm_inline.h>
 
 #include <asm/tlbflush.h>
 #include <linux/swapops.h>
@@ -1788,6 +1789,34 @@ int try_to_free_swap(struct page *page)
 	return 1;
 }
 
+void try_to_free_idle_swapcache(struct page *page)
+{
+	struct lruvec *lruvec;
+	swp_entry_t entry;
+
+	if (!PageSwapCache(page))
+		return;
+	if (PageWriteback(page))
+		return;
+	if (!PageLRU(page))
+		return;
+	if (page_mapped(page))
+		return;
+	entry.val = page_private(page);
+	if (__swap_count(entry))
+		return;
+
+	lruvec = trylock_page_lruvec_irq(page);
+	if (!lruvec)
+		return;
+
+	del_page_from_lru_list(page, lruvec);
+	ClearPageActive(page);
+	ClearPageReferenced(page);
+	add_page_to_lru_list_tail(page, lruvec);
+
+	unlock_page_lruvec_irq(lruvec);
+}
 /*
  * Free the swap entry like above, but also try to
  * free the page cache entry if it is the last user.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: move idle swap cache pages to the tail of LRU after COW
  2021-05-19  1:33 [PATCH] mm: move idle swap cache pages to the tail of LRU after COW Huang Ying
@ 2021-05-19  2:12 ` Rik van Riel
  2021-05-19  4:56   ` Huang, Ying
  2021-05-19  3:17 ` Linus Torvalds
  2021-05-19 14:49 ` Johannes Weiner
  2 siblings, 1 reply; 12+ messages in thread
From: Rik van Riel @ 2021-05-19  2:12 UTC (permalink / raw)
  To: Huang Ying, Andrew Morton
  Cc: linux-mm, linux-kernel, Matthew Wilcox, Linus Torvalds, Peter Xu,
	Hugh Dickins, Johannes Weiner, Mel Gorman, Andrea Arcangeli,
	Michal Hocko, Dave Hansen, Tim Chen

[-- Attachment #1: Type: text/plain, Size: 1066 bytes --]

On Wed, 2021-05-19 at 09:33 +0800, Huang Ying wrote:

> To test the patch, we used pmbench memory accessing benchmark with
> working-set larger than available memory on a 2-socket Intel server
> with a NVMe SSD as swap device.  Test results shows that the pmbench
> score increases up to 21.8% with the decreased size of swap cache and
> swapin throughput.

Nice!

> +++ b/mm/memory.c
> @@ -3012,6 +3012,11 @@ static vm_fault_t wp_page_copy(struct vm_fault
> *vmf)
>  				munlock_vma_page(old_page);
>  			unlock_page(old_page);
>  		}
> +		if (page_copied && PageSwapCache(old_page) &&
> +		    !page_mapped(old_page) && trylock_page(old_page)) {
> +			try_to_free_idle_swapcache(old_page);
> +			unlock_page(old_page);
> +		}

That's quite the if condition!

Would it make sense to move some of the tests, as well
as the trylock and unlock into try_to_free_idle_swapcache()
itself?

Especially considering that page_mapped is already tested
in that function, too...

>  		put_page(old_page);
>  	}



-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: move idle swap cache pages to the tail of LRU after COW
  2021-05-19  1:33 [PATCH] mm: move idle swap cache pages to the tail of LRU after COW Huang Ying
  2021-05-19  2:12 ` Rik van Riel
@ 2021-05-19  3:17 ` Linus Torvalds
  2021-05-19  3:25   ` Linus Torvalds
  2021-05-19 14:49 ` Johannes Weiner
  2 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2021-05-19  3:17 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, Linux-MM, Linux Kernel Mailing List,
	Matthew Wilcox, Peter Xu, Hugh Dickins, Johannes Weiner,
	Mel Gorman, Rik van Riel, Andrea Arcangeli, Michal Hocko,
	Dave Hansen, Tim Chen

On Tue, May 18, 2021 at 3:33 PM Huang Ying <ying.huang@intel.com> wrote:
>
> With commit 09854ba94c6a ("mm: do_wp_page() simplification"), after
> COW, the idle swap cache (neither the page nor the corresponding swap
> entry is mapped by any process) will be left at the original position
> in the LRU list.  While it may be in the active list or the head of
> the inactive list, so that vmscan may take more overhead or time to
> reclaim these actually unused pages.

This looks sensible to me (and numbers talk!), but as Rik says, it
would probably be a good idea to move the trylock_page()/unlock_page()
into try_to_free_idle_swapcache(), and that would make the calling
side a whole lot cleaner and easier to read.

                    Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: move idle swap cache pages to the tail of LRU after COW
  2021-05-19  3:17 ` Linus Torvalds
@ 2021-05-19  3:25   ` Linus Torvalds
  2021-05-19  4:49     ` Huang, Ying
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2021-05-19  3:25 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, Linux-MM, Linux Kernel Mailing List,
	Matthew Wilcox, Peter Xu, Hugh Dickins, Johannes Weiner,
	Mel Gorman, Rik van Riel, Andrea Arcangeli, Michal Hocko,
	Dave Hansen, Tim Chen

On Tue, May 18, 2021 at 5:17 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> This looks sensible to me (and numbers talk!), but as Rik says, it
> would probably be a good idea to move the trylock_page()/unlock_page()
> into try_to_free_idle_swapcache(), and that would make the calling
> side a whole lot cleaner and easier to read.

To keep the error handling simple, and keep that "if that didn't work,
just return" logic in you had, doing it as two functions like:

  static inline void locked_try_to_free_idle_swapcache(struct page *page)
  { .. your current try_to_free_idle_swapcache() .. }

  void try_to_free_idle_swapcache(struct page *page)
  {
        if (trylock_page(page)) {
                locked_try_to_free_idle_swapcache(page);
                unlock_page(page);
        }
  }

would keep that readability and simplicity.

And then the wp_page_copy() code ends up being

        if (page_copied && PageSwapCache(old_page) && !page_mapped(old_page))
                      try_to_free_idle_swapcache(old_page);

which looks pretty sensible to me: if we copied the page, and the old
page is a no longer mapped swap cache page, let's try to free it.

That's still a hell of a long conditional, partly because of those
long names. But at least it's conceptually fairly straightforward and
easy to understand what's going on.

No?

               Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: move idle swap cache pages to the tail of LRU after COW
  2021-05-19  3:25   ` Linus Torvalds
@ 2021-05-19  4:49     ` Huang, Ying
  0 siblings, 0 replies; 12+ messages in thread
From: Huang, Ying @ 2021-05-19  4:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Linux-MM, Linux Kernel Mailing List,
	Matthew Wilcox, Peter Xu, Hugh Dickins, Johannes Weiner,
	Mel Gorman, Rik van Riel, Andrea Arcangeli, Michal Hocko,
	Dave Hansen, Tim Chen

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, May 18, 2021 at 5:17 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> This looks sensible to me (and numbers talk!), but as Rik says, it
>> would probably be a good idea to move the trylock_page()/unlock_page()
>> into try_to_free_idle_swapcache(), and that would make the calling
>> side a whole lot cleaner and easier to read.
>
> To keep the error handling simple, and keep that "if that didn't work,
> just return" logic in you had, doing it as two functions like:
>
>   static inline void locked_try_to_free_idle_swapcache(struct page *page)
>   { .. your current try_to_free_idle_swapcache() .. }
>
>   void try_to_free_idle_swapcache(struct page *page)
>   {
>         if (trylock_page(page)) {
>                 locked_try_to_free_idle_swapcache(page);
>                 unlock_page(page);
>         }
>   }
>
> would keep that readability and simplicity.
>
> And then the wp_page_copy() code ends up being
>
>         if (page_copied && PageSwapCache(old_page) && !page_mapped(old_page))
>                       try_to_free_idle_swapcache(old_page);
>
> which looks pretty sensible to me: if we copied the page, and the old
> page is a no longer mapped swap cache page, let's try to free it.
>
> That's still a hell of a long conditional, partly because of those
> long names. But at least it's conceptually fairly straightforward and
> easy to understand what's going on.

Thanks!  That looks much better.  I will do that in the next version.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: move idle swap cache pages to the tail of LRU after COW
  2021-05-19  2:12 ` Rik van Riel
@ 2021-05-19  4:56   ` Huang, Ying
  0 siblings, 0 replies; 12+ messages in thread
From: Huang, Ying @ 2021-05-19  4:56 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, linux-mm, linux-kernel, Matthew Wilcox,
	Linus Torvalds, Peter Xu, Hugh Dickins, Johannes Weiner,
	Mel Gorman, Andrea Arcangeli, Michal Hocko, Dave Hansen,
	Tim Chen

Rik van Riel <riel@surriel.com> writes:

> On Wed, 2021-05-19 at 09:33 +0800, Huang Ying wrote:
>
>> To test the patch, we used pmbench memory accessing benchmark with
>> working-set larger than available memory on a 2-socket Intel server
>> with a NVMe SSD as swap device.  Test results shows that the pmbench
>> score increases up to 21.8% with the decreased size of swap cache and
>> swapin throughput.
>
> Nice!
>
>> +++ b/mm/memory.c
>> @@ -3012,6 +3012,11 @@ static vm_fault_t wp_page_copy(struct vm_fault
>> *vmf)
>>  				munlock_vma_page(old_page);
>>  			unlock_page(old_page);
>>  		}
>> +		if (page_copied && PageSwapCache(old_page) &&
>> +		    !page_mapped(old_page) && trylock_page(old_page)) {
>> +			try_to_free_idle_swapcache(old_page);
>> +			unlock_page(old_page);
>> +		}
>
> That's quite the if condition!
>
> Would it make sense to move some of the tests, as well
> as the trylock and unlock into try_to_free_idle_swapcache()
> itself?

Sure.  Will put trylock/unlock into try_to_free_idle_swapcache() as
suggested by Linus.

> Especially considering that page_mapped is already tested
> in that function, too...

The two page_mapped() tests are intended.  The first one is a quick
check with the page unlocked, the second one is to confirm with the page
locked.  Because if the page is unlocked, the swap count may be
transited to map count or vice versa.

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: move idle swap cache pages to the tail of LRU after COW
  2021-05-19  1:33 [PATCH] mm: move idle swap cache pages to the tail of LRU after COW Huang Ying
  2021-05-19  2:12 ` Rik van Riel
  2021-05-19  3:17 ` Linus Torvalds
@ 2021-05-19 14:49 ` Johannes Weiner
  2021-05-20  1:22   ` Huang, Ying
  2 siblings, 1 reply; 12+ messages in thread
From: Johannes Weiner @ 2021-05-19 14:49 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Matthew Wilcox,
	Linus Torvalds, Peter Xu, Hugh Dickins, Mel Gorman, Rik van Riel,
	Andrea Arcangeli, Michal Hocko, Dave Hansen, Tim Chen

On Wed, May 19, 2021 at 09:33:13AM +0800, Huang Ying wrote:
> diff --git a/mm/memory.c b/mm/memory.c
> index b83f734c4e1d..2b6847f4c03e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3012,6 +3012,11 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>  				munlock_vma_page(old_page);
>  			unlock_page(old_page);
>  		}
> +		if (page_copied && PageSwapCache(old_page) &&
> +		    !page_mapped(old_page) && trylock_page(old_page)) {
> +			try_to_free_idle_swapcache(old_page);
> +			unlock_page(old_page);

If there are no more swap or pte references, can we just attempt to
free the page right away, like we do during regular unmap?

		if (page_copied)
			free_swap_cache(old_page);
		put_page(old_page);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: move idle swap cache pages to the tail of LRU after COW
  2021-05-19 14:49 ` Johannes Weiner
@ 2021-05-20  1:22   ` Huang, Ying
  2021-05-20  1:46     ` Johannes Weiner
  0 siblings, 1 reply; 12+ messages in thread
From: Huang, Ying @ 2021-05-20  1:22 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-mm, linux-kernel, Matthew Wilcox,
	Linus Torvalds, Peter Xu, Hugh Dickins, Mel Gorman, Rik van Riel,
	Andrea Arcangeli, Michal Hocko, Dave Hansen, Tim Chen

Johannes Weiner <hannes@cmpxchg.org> writes:

> On Wed, May 19, 2021 at 09:33:13AM +0800, Huang Ying wrote:
>> diff --git a/mm/memory.c b/mm/memory.c
>> index b83f734c4e1d..2b6847f4c03e 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -3012,6 +3012,11 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>>  				munlock_vma_page(old_page);
>>  			unlock_page(old_page);
>>  		}
>> +		if (page_copied && PageSwapCache(old_page) &&
>> +		    !page_mapped(old_page) && trylock_page(old_page)) {
>> +			try_to_free_idle_swapcache(old_page);
>> +			unlock_page(old_page);
>
> If there are no more swap or pte references, can we just attempt to
> free the page right away, like we do during regular unmap?
>
> 		if (page_copied)
> 			free_swap_cache(old_page);
> 		put_page(old_page);

A previous version of the patch does roughly this.

https://lore.kernel.org/lkml/20210113024241.179113-1-ying.huang@intel.com/

But Linus has concerns with the overhead introduced in the hot COW path.

Another possibility is to move the idle swap cache page to the tail of
the file LRU list.  But the question is how to identify the page.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: move idle swap cache pages to the tail of LRU after COW
  2021-05-20  1:22   ` Huang, Ying
@ 2021-05-20  1:46     ` Johannes Weiner
  2021-05-20  1:59       ` Huang, Ying
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Weiner @ 2021-05-20  1:46 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Matthew Wilcox,
	Linus Torvalds, Peter Xu, Hugh Dickins, Mel Gorman, Rik van Riel,
	Andrea Arcangeli, Michal Hocko, Dave Hansen, Tim Chen

On Thu, May 20, 2021 at 09:22:45AM +0800, Huang, Ying wrote:
> Johannes Weiner <hannes@cmpxchg.org> writes:
> 
> > On Wed, May 19, 2021 at 09:33:13AM +0800, Huang Ying wrote:
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index b83f734c4e1d..2b6847f4c03e 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -3012,6 +3012,11 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
> >>  				munlock_vma_page(old_page);
> >>  			unlock_page(old_page);
> >>  		}
> >> +		if (page_copied && PageSwapCache(old_page) &&
> >> +		    !page_mapped(old_page) && trylock_page(old_page)) {
> >> +			try_to_free_idle_swapcache(old_page);
> >> +			unlock_page(old_page);
> >
> > If there are no more swap or pte references, can we just attempt to
> > free the page right away, like we do during regular unmap?
> >
> > 		if (page_copied)
> > 			free_swap_cache(old_page);
> > 		put_page(old_page);
> 
> A previous version of the patch does roughly this.
> 
> https://lore.kernel.org/lkml/20210113024241.179113-1-ying.huang@intel.com/
> 
> But Linus has concerns with the overhead introduced in the hot COW path.

Sorry, I had missed that thread.

It sounds like there were the same concerns about the LRU shuffling
overhead in the COW page. Now we have numbers for that, but not the
free_swap_cache version. Would you be able to run the numbers for that
as well? It would be interesting to see how much the additional code
complexity buys us.

> Another possibility is to move the idle swap cache page to the tail of
> the file LRU list.  But the question is how to identify the page.

The LRU type is identified by PG_swapbacked, and we do clear that for
anon pages to implement MADV_FREE. It may work here too. But I'm
honestly a bit skeptical about the ROI on this...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: move idle swap cache pages to the tail of LRU after COW
  2021-05-20  1:46     ` Johannes Weiner
@ 2021-05-20  1:59       ` Huang, Ying
  2021-05-20 17:49         ` Johannes Weiner
  0 siblings, 1 reply; 12+ messages in thread
From: Huang, Ying @ 2021-05-20  1:59 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-mm, linux-kernel, Matthew Wilcox,
	Linus Torvalds, Peter Xu, Hugh Dickins, Mel Gorman, Rik van Riel,
	Andrea Arcangeli, Michal Hocko, Dave Hansen, Tim Chen

Johannes Weiner <hannes@cmpxchg.org> writes:

> On Thu, May 20, 2021 at 09:22:45AM +0800, Huang, Ying wrote:
>> Johannes Weiner <hannes@cmpxchg.org> writes:
>> 
>> > On Wed, May 19, 2021 at 09:33:13AM +0800, Huang Ying wrote:
>> >> diff --git a/mm/memory.c b/mm/memory.c
>> >> index b83f734c4e1d..2b6847f4c03e 100644
>> >> --- a/mm/memory.c
>> >> +++ b/mm/memory.c
>> >> @@ -3012,6 +3012,11 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>> >>  				munlock_vma_page(old_page);
>> >>  			unlock_page(old_page);
>> >>  		}
>> >> +		if (page_copied && PageSwapCache(old_page) &&
>> >> +		    !page_mapped(old_page) && trylock_page(old_page)) {
>> >> +			try_to_free_idle_swapcache(old_page);
>> >> +			unlock_page(old_page);
>> >
>> > If there are no more swap or pte references, can we just attempt to
>> > free the page right away, like we do during regular unmap?
>> >
>> > 		if (page_copied)
>> > 			free_swap_cache(old_page);
>> > 		put_page(old_page);
>> 
>> A previous version of the patch does roughly this.
>> 
>> https://lore.kernel.org/lkml/20210113024241.179113-1-ying.huang@intel.com/
>> 
>> But Linus has concerns with the overhead introduced in the hot COW path.
>
> Sorry, I had missed that thread.
>
> It sounds like there were the same concerns about the LRU shuffling
> overhead in the COW page. Now we have numbers for that, but not the
> free_swap_cache version. Would you be able to run the numbers for that
> as well? It would be interesting to see how much the additional code
> complexity buys us.

The number for which workload?  The workload that is used to evaluate
this patch?

>> Another possibility is to move the idle swap cache page to the tail of
>> the file LRU list.  But the question is how to identify the page.
>
> The LRU type is identified by PG_swapbacked, and we do clear that for
> anon pages to implement MADV_FREE. It may work here too. But I'm
> honestly a bit skeptical about the ROI on this...

The definition of PageSwapCache() is

static __always_inline int PageSwapCache(struct page *page)
{
#ifdef CONFIG_THP_SWAP
	page = compound_head(page);
#endif
	return PageSwapBacked(page) && test_bit(PG_swapcache, &page->flags);
}

So we cannot clear PG_swapbacked directly.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: move idle swap cache pages to the tail of LRU after COW
  2021-05-20  1:59       ` Huang, Ying
@ 2021-05-20 17:49         ` Johannes Weiner
  2021-05-21  2:05           ` Huang, Ying
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Weiner @ 2021-05-20 17:49 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Matthew Wilcox,
	Linus Torvalds, Peter Xu, Hugh Dickins, Mel Gorman, Rik van Riel,
	Andrea Arcangeli, Michal Hocko, Dave Hansen, Tim Chen

On Thu, May 20, 2021 at 09:59:15AM +0800, Huang, Ying wrote:
> Johannes Weiner <hannes@cmpxchg.org> writes:
> 
> > On Thu, May 20, 2021 at 09:22:45AM +0800, Huang, Ying wrote:
> >> Johannes Weiner <hannes@cmpxchg.org> writes:
> >> 
> >> > On Wed, May 19, 2021 at 09:33:13AM +0800, Huang Ying wrote:
> >> >> diff --git a/mm/memory.c b/mm/memory.c
> >> >> index b83f734c4e1d..2b6847f4c03e 100644
> >> >> --- a/mm/memory.c
> >> >> +++ b/mm/memory.c
> >> >> @@ -3012,6 +3012,11 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
> >> >>  				munlock_vma_page(old_page);
> >> >>  			unlock_page(old_page);
> >> >>  		}
> >> >> +		if (page_copied && PageSwapCache(old_page) &&
> >> >> +		    !page_mapped(old_page) && trylock_page(old_page)) {
> >> >> +			try_to_free_idle_swapcache(old_page);
> >> >> +			unlock_page(old_page);
> >> >
> >> > If there are no more swap or pte references, can we just attempt to
> >> > free the page right away, like we do during regular unmap?
> >> >
> >> > 		if (page_copied)
> >> > 			free_swap_cache(old_page);
> >> > 		put_page(old_page);
> >> 
> >> A previous version of the patch does roughly this.
> >> 
> >> https://lore.kernel.org/lkml/20210113024241.179113-1-ying.huang@intel.com/
> >> 
> >> But Linus has concerns with the overhead introduced in the hot COW path.
> >
> > Sorry, I had missed that thread.
> >
> > It sounds like there were the same concerns about the LRU shuffling
> > overhead in the COW page. Now we have numbers for that, but not the
> > free_swap_cache version. Would you be able to run the numbers for that
> > as well? It would be interesting to see how much the additional code
> > complexity buys us.
> 
> The number for which workload?  The workload that is used to evaluate
> this patch?

Yeah, the pmbench one from the changelog.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: move idle swap cache pages to the tail of LRU after COW
  2021-05-20 17:49         ` Johannes Weiner
@ 2021-05-21  2:05           ` Huang, Ying
  0 siblings, 0 replies; 12+ messages in thread
From: Huang, Ying @ 2021-05-21  2:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-mm, linux-kernel, Matthew Wilcox,
	Linus Torvalds, Peter Xu, Hugh Dickins, Mel Gorman, Rik van Riel,
	Andrea Arcangeli, Michal Hocko, Dave Hansen, Tim Chen

Johannes Weiner <hannes@cmpxchg.org> writes:

> On Thu, May 20, 2021 at 09:59:15AM +0800, Huang, Ying wrote:
>> Johannes Weiner <hannes@cmpxchg.org> writes:
>> 
>> > On Thu, May 20, 2021 at 09:22:45AM +0800, Huang, Ying wrote:
>> >> Johannes Weiner <hannes@cmpxchg.org> writes:
>> >> 
>> >> > On Wed, May 19, 2021 at 09:33:13AM +0800, Huang Ying wrote:
>> >> >> diff --git a/mm/memory.c b/mm/memory.c
>> >> >> index b83f734c4e1d..2b6847f4c03e 100644
>> >> >> --- a/mm/memory.c
>> >> >> +++ b/mm/memory.c
>> >> >> @@ -3012,6 +3012,11 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>> >> >>  				munlock_vma_page(old_page);
>> >> >>  			unlock_page(old_page);
>> >> >>  		}
>> >> >> +		if (page_copied && PageSwapCache(old_page) &&
>> >> >> +		    !page_mapped(old_page) && trylock_page(old_page)) {
>> >> >> +			try_to_free_idle_swapcache(old_page);
>> >> >> +			unlock_page(old_page);
>> >> >
>> >> > If there are no more swap or pte references, can we just attempt to
>> >> > free the page right away, like we do during regular unmap?
>> >> >
>> >> > 		if (page_copied)
>> >> > 			free_swap_cache(old_page);
>> >> > 		put_page(old_page);
>> >> 
>> >> A previous version of the patch does roughly this.
>> >> 
>> >> https://lore.kernel.org/lkml/20210113024241.179113-1-ying.huang@intel.com/
>> >> 
>> >> But Linus has concerns with the overhead introduced in the hot COW path.
>> >
>> > Sorry, I had missed that thread.
>> >
>> > It sounds like there were the same concerns about the LRU shuffling
>> > overhead in the COW page. Now we have numbers for that, but not the
>> > free_swap_cache version. Would you be able to run the numbers for that
>> > as well? It would be interesting to see how much the additional code
>> > complexity buys us.
>> 
>> The number for which workload?  The workload that is used to evaluate
>> this patch?
>
> Yeah, the pmbench one from the changelog.

Sure.  I have rebased the original patch that frees the idle swap cache
directly and done the test.  The results show that the pmbench score of
freeing directly is a little better than that of moving to the tail of
LRU.  The pmbench score increases about 3.6%.  I think this is expected,
because we need to free the page finally even if we move the idle swap
cache to the tail of LRU.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-05-21  2:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-19  1:33 [PATCH] mm: move idle swap cache pages to the tail of LRU after COW Huang Ying
2021-05-19  2:12 ` Rik van Riel
2021-05-19  4:56   ` Huang, Ying
2021-05-19  3:17 ` Linus Torvalds
2021-05-19  3:25   ` Linus Torvalds
2021-05-19  4:49     ` Huang, Ying
2021-05-19 14:49 ` Johannes Weiner
2021-05-20  1:22   ` Huang, Ying
2021-05-20  1:46     ` Johannes Weiner
2021-05-20  1:59       ` Huang, Ying
2021-05-20 17:49         ` Johannes Weiner
2021-05-21  2:05           ` Huang, Ying

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).