linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] mm: swap: properly update readahead statistics in unuse_pte_range()
@ 2020-04-16 18:01 Andrea Righi
  2020-04-17  3:01 ` Huang, Ying
  0 siblings, 1 reply; 5+ messages in thread
From: Andrea Righi @ 2020-04-16 18:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Huang Ying, Minchan Kim, Anchal Agarwal, linux-mm, linux-kernel

In unuse_pte_range() we blindly swap-in pages without checking if the
swap entry is already present in the swap cache.

By doing this, the hit/miss ratio used by the swap readahead heuristic
is not properly updated and this leads to non-optimal performance during
swapoff.

Tracing the distribution of the readahead size returned by the swap
readahead heuristic during swapoff shows that a small readahead size is
used most of the time as if we had only misses (this happens both with
cluster and vma readahead), for example:

r::swapin_nr_pages(unsigned long offset):unsigned long:$retval
        COUNT      EVENT
        36948      $retval = 8
        44151      $retval = 4
        49290      $retval = 1
        527771     $retval = 2

Checking if the swap entry is present in the swap cache, instead, allows
to properly update the readahead statistics and the heuristic behaves in
a better way during swapoff, selecting a bigger readahead size:

r::swapin_nr_pages(unsigned long offset):unsigned long:$retval
        COUNT      EVENT
        1618       $retval = 1
        4960       $retval = 2
        41315      $retval = 4
        103521     $retval = 8

In terms of swapoff performance the result is the following:

Testing environment
===================

 - Host:
   CPU: 1.8GHz Intel Core i7-8565U (quad-core, 8MB cache)
   HDD: PC401 NVMe SK hynix 512GB
   MEM: 16GB

 - Guest (kvm):
   8GB of RAM
   virtio block driver
   16GB swap file on ext4 (/swapfile)

Test case
=========
 - allocate 85% of memory
 - `systemctl hibernate` to force all the pages to be swapped-out to the
   swap file
 - resume the system
 - measure the time that swapoff takes to complete:
   # /usr/bin/time swapoff /swapfile

Result (swapoff time)
======
                  5.6 vanilla   5.6 w/ this patch
                  -----------   -----------------
cluster-readahead      22.09s              12.19s
    vma-readahead      18.20s              15.33s

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
---
Changes in v3:
 - properly update swap readahead statistics instead of forcing a
   fixed-size readahead

 mm/swapfile.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 5871a2aa86a5..f8bf926c9c8f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1937,10 +1937,14 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 
 		pte_unmap(pte);
 		swap_map = &si->swap_map[offset];
-		vmf.vma = vma;
-		vmf.address = addr;
-		vmf.pmd = pmd;
-		page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, &vmf);
+		page = lookup_swap_cache(entry, vma, addr);
+		if (!page) {
+			vmf.vma = vma;
+			vmf.address = addr;
+			vmf.pmd = pmd;
+			page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
+						&vmf);
+		}
 		if (!page) {
 			if (*swap_map == 0 || *swap_map == SWAP_MAP_BAD)
 				goto try_next;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] mm: swap: properly update readahead statistics in unuse_pte_range()
  2020-04-16 18:01 [PATCH v3] mm: swap: properly update readahead statistics in unuse_pte_range() Andrea Righi
@ 2020-04-17  3:01 ` Huang, Ying
  2020-04-17  4:06   ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Huang, Ying @ 2020-04-17  3:01 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, Minchan Kim, Anchal Agarwal, linux-mm, linux-kernel

Andrea Righi <andrea.righi@canonical.com> writes:

> In unuse_pte_range() we blindly swap-in pages without checking if the
> swap entry is already present in the swap cache.
>
> By doing this, the hit/miss ratio used by the swap readahead heuristic
> is not properly updated and this leads to non-optimal performance during
> swapoff.

It's more important to describe why we need this patch in the patch
description.  So, please add some information about your use case.  And
please focus on the technical part instead of the business part.

> Tracing the distribution of the readahead size returned by the swap
> readahead heuristic during swapoff shows that a small readahead size is
> used most of the time as if we had only misses (this happens both with
> cluster and vma readahead), for example:
>
> r::swapin_nr_pages(unsigned long offset):unsigned long:$retval
>         COUNT      EVENT
>         36948      $retval = 8
>         44151      $retval = 4
>         49290      $retval = 1
>         527771     $retval = 2
>
> Checking if the swap entry is present in the swap cache, instead, allows
> to properly update the readahead statistics and the heuristic behaves in
> a better way during swapoff, selecting a bigger readahead size:
>
> r::swapin_nr_pages(unsigned long offset):unsigned long:$retval
>         COUNT      EVENT
>         1618       $retval = 1
>         4960       $retval = 2
>         41315      $retval = 4
>         103521     $retval = 8
>
> In terms of swapoff performance the result is the following:
>
> Testing environment
> ===================
>
>  - Host:
>    CPU: 1.8GHz Intel Core i7-8565U (quad-core, 8MB cache)
>    HDD: PC401 NVMe SK hynix 512GB
>    MEM: 16GB
>
>  - Guest (kvm):
>    8GB of RAM
>    virtio block driver
>    16GB swap file on ext4 (/swapfile)
>
> Test case
> =========
>  - allocate 85% of memory
>  - `systemctl hibernate` to force all the pages to be swapped-out to the
>    swap file
>  - resume the system
>  - measure the time that swapoff takes to complete:
>    # /usr/bin/time swapoff /swapfile
>
> Result (swapoff time)
> ======
>                   5.6 vanilla   5.6 w/ this patch
>                   -----------   -----------------
> cluster-readahead      22.09s              12.19s
>     vma-readahead      18.20s              15.33s
>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>

Thanks!  But you don't need to do this.  You can add my Reviewed-by after
we have finished the work on patch description.

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] mm: swap: properly update readahead statistics in unuse_pte_range()
  2020-04-17  3:01 ` Huang, Ying
@ 2020-04-17  4:06   ` Andrew Morton
  2020-04-17  5:18     ` Huang, Ying
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2020-04-17  4:06 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrea Righi, Minchan Kim, Anchal Agarwal, linux-mm, linux-kernel

On Fri, 17 Apr 2020 11:01:22 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:

> Andrea Righi <andrea.righi@canonical.com> writes:
> 
> > In unuse_pte_range() we blindly swap-in pages without checking if the
> > swap entry is already present in the swap cache.
> >
> > By doing this, the hit/miss ratio used by the swap readahead heuristic
> > is not properly updated and this leads to non-optimal performance during
> > swapoff.
> 
> It's more important to describe why we need this patch in the patch
> description.  So, please add some information about your use case.  And
> please focus on the technical part instead of the business part.

Confused.  I thought the changelog was quite good.  If "business part"
means "end user effect of the patch" then that's a very important
thing.

> Thanks!  But you don't need to do this.  You can add my Reviewed-by after
> we have finished the work on patch description.

Can you be more specific about how you want this changed?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] mm: swap: properly update readahead statistics in unuse_pte_range()
  2020-04-17  4:06   ` Andrew Morton
@ 2020-04-17  5:18     ` Huang, Ying
  2020-04-18  8:51       ` Andrea Righi
  0 siblings, 1 reply; 5+ messages in thread
From: Huang, Ying @ 2020-04-17  5:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Righi, Minchan Kim, Anchal Agarwal, linux-mm, linux-kernel

Andrew Morton <akpm@linux-foundation.org> writes:

> On Fri, 17 Apr 2020 11:01:22 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
>
>> Andrea Righi <andrea.righi@canonical.com> writes:
>> 
>> > In unuse_pte_range() we blindly swap-in pages without checking if the
>> > swap entry is already present in the swap cache.
>> >
>> > By doing this, the hit/miss ratio used by the swap readahead heuristic
>> > is not properly updated and this leads to non-optimal performance during
>> > swapoff.
>> 
>> It's more important to describe why we need this patch in the patch
>> description.  So, please add some information about your use case.  And
>> please focus on the technical part instead of the business part.
>
> Confused.  I thought the changelog was quite good.  If "business part"
> means "end user effect of the patch" then that's a very important
> thing.

Previously, Andrea has described their use case in the cloud environment
to hiberate the guest and swapoff after resuming.  So swapoff
performance is important for them.  I think that should be included.
For the business part, I mean something like "Ubuntu used in AWS EC2", I
think that isn't important for the patch description.

>> Thanks!  But you don't need to do this.  You can add my Reviewed-by after
>> we have finished the work on patch description.
>
> Can you be more specific about how you want this changed?

Please use

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] mm: swap: properly update readahead statistics in unuse_pte_range()
  2020-04-17  5:18     ` Huang, Ying
@ 2020-04-18  8:51       ` Andrea Righi
  0 siblings, 0 replies; 5+ messages in thread
From: Andrea Righi @ 2020-04-18  8:51 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, Minchan Kim, Anchal Agarwal, linux-mm, linux-kernel

On Fri, Apr 17, 2020 at 01:18:37PM +0800, Huang, Ying wrote:
> Andrew Morton <akpm@linux-foundation.org> writes:
> 
> > On Fri, 17 Apr 2020 11:01:22 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
> >
> >> Andrea Righi <andrea.righi@canonical.com> writes:
> >> 
> >> > In unuse_pte_range() we blindly swap-in pages without checking if the
> >> > swap entry is already present in the swap cache.
> >> >
> >> > By doing this, the hit/miss ratio used by the swap readahead heuristic
> >> > is not properly updated and this leads to non-optimal performance during
> >> > swapoff.
> >> 
> >> It's more important to describe why we need this patch in the patch
> >> description.  So, please add some information about your use case.  And
> >> please focus on the technical part instead of the business part.
> >
> > Confused.  I thought the changelog was quite good.  If "business part"
> > means "end user effect of the patch" then that's a very important
> > thing.
> 
> Previously, Andrea has described their use case in the cloud environment
> to hiberate the guest and swapoff after resuming.  So swapoff
> performance is important for them.  I think that should be included.
> For the business part, I mean something like "Ubuntu used in AWS EC2", I
> think that isn't important for the patch description.

I just sent a v4 of this patch adding "conclusion" section in the
description to better explain the purpose of this patch. Let me know if
you have any comment on that.

Thanks,
-Andrea


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-04-18  8:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-16 18:01 [PATCH v3] mm: swap: properly update readahead statistics in unuse_pte_range() Andrea Righi
2020-04-17  3:01 ` Huang, Ying
2020-04-17  4:06   ` Andrew Morton
2020-04-17  5:18     ` Huang, Ying
2020-04-18  8:51       ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).