* [PATH stable 5.4] mm: swap: properly update readahead statistics in unuse_pte_range()
@ 2023-01-30 15:28 Luiz Capitulino
2023-02-03 9:19 ` Greg KH
0 siblings, 1 reply; 3+ messages in thread
From: Luiz Capitulino @ 2023-01-30 15:28 UTC (permalink / raw)
To: stable
Cc: Andrea Righi, Andrew Morton, Huang, Ying, Minchan Kim,
Anchal Agarwal, Hugh Dickins, Vineeth Remanan Pillai,
Kelley Nielsen, Linus Torvalds
From: Andrea Righi <andrea.righi@canonical.com>
Commit ebc5951eea499314f6fbbde20e295f1345c67330 upstream.
[ This fixes a performance issue we're seeing in AWS instances when
running swapoff and using the global readahead algorithm. For a
particular instance configuration, Without this fix I/O throughput
is very low during swapoff (about 15 MB/s) with this patch is
reaches 500 MB/s. Tested swapoff with different workloads with
this patch applied. 5.10 onwards already have this fix ]
In unuse_pte_range() we blindly swap-in pages without checking if the
swap entry is already present in the swap cache.
By doing this, the hit/miss ratio used by the swap readahead heuristic
is not properly updated and this leads to non-optimal performance during
swapoff.
Tracing the distribution of the readahead size returned by the swap
readahead heuristic during swapoff shows that a small readahead size is
used most of the time as if we had only misses (this happens both with
cluster and vma readahead), for example:
r::swapin_nr_pages(unsigned long offset):unsigned long:$retval
COUNT EVENT
36948 $retval = 8
44151 $retval = 4
49290 $retval = 1
527771 $retval = 2
Checking if the swap entry is present in the swap cache, instead, allows
to properly update the readahead statistics and the heuristic behaves in a
better way during swapoff, selecting a bigger readahead size:
r::swapin_nr_pages(unsigned long offset):unsigned long:$retval
COUNT EVENT
1618 $retval = 1
4960 $retval = 2
41315 $retval = 4
103521 $retval = 8
In terms of swapoff performance the result is the following:
Testing environment
===================
- Host:
CPU: 1.8GHz Intel Core i7-8565U (quad-core, 8MB cache)
HDD: PC401 NVMe SK hynix 512GB
MEM: 16GB
- Guest (kvm):
8GB of RAM
virtio block driver
16GB swap file on ext4 (/swapfile)
Test case
=========
- allocate 85% of memory
- `systemctl hibernate` to force all the pages to be swapped-out to the
swap file
- resume the system
- measure the time that swapoff takes to complete:
# /usr/bin/time swapoff /swapfile
Result (swapoff time)
======
5.6 vanilla 5.6 w/ this patch
----------- -----------------
cluster-readahead 22.09s 12.19s
vma-readahead 18.20s 15.33s
Conclusion
==========
The specific use case this patch is addressing is to improve swapoff
performance in cloud environments when a VM has been hibernated, resumed
and all the memory needs to be forced back to RAM by disabling swap.
This change allows to better exploits the advantages of the readahead
heuristic during swapoff and this improvement allows to to speed up the
resume process of such VMs.
[andrea.righi@canonical.com: update changelog]
Link: http://lkml.kernel.org/r/20200418084705.GA147642@xps-13
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Anchal Agarwal <anchalag@amazon.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vineeth Remanan Pillai <vpillai@digitalocean.com>
Cc: Kelley Nielsen <kelleynnn@gmail.com>
Link: http://lkml.kernel.org/r/20200416180132.GB3352@xps-13
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
mm/swapfile.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index f6964212c6c8..fe5995c38ea4 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1950,10 +1950,14 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
pte_unmap(pte);
swap_map = &si->swap_map[offset];
- vmf.vma = vma;
- vmf.address = addr;
- vmf.pmd = pmd;
- page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, &vmf);
+ page = lookup_swap_cache(entry, vma, addr);
+ if (!page) {
+ vmf.vma = vma;
+ vmf.address = addr;
+ vmf.pmd = pmd;
+ page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
+ &vmf);
+ }
if (!page) {
if (*swap_map == 0 || *swap_map == SWAP_MAP_BAD)
goto try_next;
--
2.38.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATH stable 5.4] mm: swap: properly update readahead statistics in unuse_pte_range()
2023-01-30 15:28 [PATH stable 5.4] mm: swap: properly update readahead statistics in unuse_pte_range() Luiz Capitulino
@ 2023-02-03 9:19 ` Greg KH
2023-02-03 14:33 ` Luiz Capitulino
0 siblings, 1 reply; 3+ messages in thread
From: Greg KH @ 2023-02-03 9:19 UTC (permalink / raw)
To: Luiz Capitulino
Cc: stable, Andrea Righi, Andrew Morton, Huang, Ying, Minchan Kim,
Anchal Agarwal, Hugh Dickins, Vineeth Remanan Pillai,
Kelley Nielsen, Linus Torvalds
On Mon, Jan 30, 2023 at 03:28:23PM +0000, Luiz Capitulino wrote:
> From: Andrea Righi <andrea.righi@canonical.com>
>
> Commit ebc5951eea499314f6fbbde20e295f1345c67330 upstream.
>
> [ This fixes a performance issue we're seeing in AWS instances when
> running swapoff and using the global readahead algorithm. For a
> particular instance configuration, Without this fix I/O throughput
> is very low during swapoff (about 15 MB/s) with this patch is
> reaches 500 MB/s. Tested swapoff with different workloads with
> this patch applied. 5.10 onwards already have this fix ]
>
> In unuse_pte_range() we blindly swap-in pages without checking if the
> swap entry is already present in the swap cache.
>
> By doing this, the hit/miss ratio used by the swap readahead heuristic
> is not properly updated and this leads to non-optimal performance during
> swapoff.
>
> Tracing the distribution of the readahead size returned by the swap
> readahead heuristic during swapoff shows that a small readahead size is
> used most of the time as if we had only misses (this happens both with
> cluster and vma readahead), for example:
>
> r::swapin_nr_pages(unsigned long offset):unsigned long:$retval
> COUNT EVENT
> 36948 $retval = 8
> 44151 $retval = 4
> 49290 $retval = 1
> 527771 $retval = 2
>
> Checking if the swap entry is present in the swap cache, instead, allows
> to properly update the readahead statistics and the heuristic behaves in a
> better way during swapoff, selecting a bigger readahead size:
>
> r::swapin_nr_pages(unsigned long offset):unsigned long:$retval
> COUNT EVENT
> 1618 $retval = 1
> 4960 $retval = 2
> 41315 $retval = 4
> 103521 $retval = 8
>
> In terms of swapoff performance the result is the following:
>
> Testing environment
> ===================
>
> - Host:
> CPU: 1.8GHz Intel Core i7-8565U (quad-core, 8MB cache)
> HDD: PC401 NVMe SK hynix 512GB
> MEM: 16GB
>
> - Guest (kvm):
> 8GB of RAM
> virtio block driver
> 16GB swap file on ext4 (/swapfile)
>
> Test case
> =========
> - allocate 85% of memory
> - `systemctl hibernate` to force all the pages to be swapped-out to the
> swap file
> - resume the system
> - measure the time that swapoff takes to complete:
> # /usr/bin/time swapoff /swapfile
>
> Result (swapoff time)
> ======
> 5.6 vanilla 5.6 w/ this patch
> ----------- -----------------
> cluster-readahead 22.09s 12.19s
> vma-readahead 18.20s 15.33s
>
> Conclusion
> ==========
>
> The specific use case this patch is addressing is to improve swapoff
> performance in cloud environments when a VM has been hibernated, resumed
> and all the memory needs to be forced back to RAM by disabling swap.
>
> This change allows to better exploits the advantages of the readahead
> heuristic during swapoff and this improvement allows to to speed up the
> resume process of such VMs.
>
> [andrea.righi@canonical.com: update changelog]
> Link: http://lkml.kernel.org/r/20200418084705.GA147642@xps-13
> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Anchal Agarwal <anchalag@amazon.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Vineeth Remanan Pillai <vpillai@digitalocean.com>
> Cc: Kelley Nielsen <kelleynnn@gmail.com>
> Link: http://lkml.kernel.org/r/20200416180132.GB3352@xps-13
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
You forwarded on a backport without signing off on it yourself, sorry, I
can't take this as-is. Please fix up and resend.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATH stable 5.4] mm: swap: properly update readahead statistics in unuse_pte_range()
2023-02-03 9:19 ` Greg KH
@ 2023-02-03 14:33 ` Luiz Capitulino
0 siblings, 0 replies; 3+ messages in thread
From: Luiz Capitulino @ 2023-02-03 14:33 UTC (permalink / raw)
To: Greg KH
Cc: stable, Andrea Righi, Andrew Morton, Huang, Ying, Minchan Kim,
Anchal Agarwal, Hugh Dickins, Vineeth Remanan Pillai,
Kelley Nielsen, Linus Torvalds
On 2023-02-03 04:19, Greg KH wrote:
>
> On Mon, Jan 30, 2023 at 03:28:23PM +0000, Luiz Capitulino wrote:
>> From: Andrea Righi <andrea.righi@canonical.com>
>>
>> Commit ebc5951eea499314f6fbbde20e295f1345c67330 upstream.
>>
>> [ This fixes a performance issue we're seeing in AWS instances when
>> running swapoff and using the global readahead algorithm. For a
>> particular instance configuration, Without this fix I/O throughput
>> is very low during swapoff (about 15 MB/s) with this patch is
>> reaches 500 MB/s. Tested swapoff with different workloads with
>> this patch applied. 5.10 onwards already have this fix ]
>>
>> In unuse_pte_range() we blindly swap-in pages without checking if the
>> swap entry is already present in the swap cache.
>>
>> By doing this, the hit/miss ratio used by the swap readahead heuristic
>> is not properly updated and this leads to non-optimal performance during
>> swapoff.
>>
>> Tracing the distribution of the readahead size returned by the swap
>> readahead heuristic during swapoff shows that a small readahead size is
>> used most of the time as if we had only misses (this happens both with
>> cluster and vma readahead), for example:
>>
>> r::swapin_nr_pages(unsigned long offset):unsigned long:$retval
>> COUNT EVENT
>> 36948 $retval = 8
>> 44151 $retval = 4
>> 49290 $retval = 1
>> 527771 $retval = 2
>>
>> Checking if the swap entry is present in the swap cache, instead, allows
>> to properly update the readahead statistics and the heuristic behaves in a
>> better way during swapoff, selecting a bigger readahead size:
>>
>> r::swapin_nr_pages(unsigned long offset):unsigned long:$retval
>> COUNT EVENT
>> 1618 $retval = 1
>> 4960 $retval = 2
>> 41315 $retval = 4
>> 103521 $retval = 8
>>
>> In terms of swapoff performance the result is the following:
>>
>> Testing environment
>> ===================
>>
>> - Host:
>> CPU: 1.8GHz Intel Core i7-8565U (quad-core, 8MB cache)
>> HDD: PC401 NVMe SK hynix 512GB
>> MEM: 16GB
>>
>> - Guest (kvm):
>> 8GB of RAM
>> virtio block driver
>> 16GB swap file on ext4 (/swapfile)
>>
>> Test case
>> =========
>> - allocate 85% of memory
>> - `systemctl hibernate` to force all the pages to be swapped-out to the
>> swap file
>> - resume the system
>> - measure the time that swapoff takes to complete:
>> # /usr/bin/time swapoff /swapfile
>>
>> Result (swapoff time)
>> ======
>> 5.6 vanilla 5.6 w/ this patch
>> ----------- -----------------
>> cluster-readahead 22.09s 12.19s
>> vma-readahead 18.20s 15.33s
>>
>> Conclusion
>> ==========
>>
>> The specific use case this patch is addressing is to improve swapoff
>> performance in cloud environments when a VM has been hibernated, resumed
>> and all the memory needs to be forced back to RAM by disabling swap.
>>
>> This change allows to better exploits the advantages of the readahead
>> heuristic during swapoff and this improvement allows to to speed up the
>> resume process of such VMs.
>>
>> [andrea.righi@canonical.com: update changelog]
>> Link: http://lkml.kernel.org/r/20200418084705.GA147642@xps-13
>> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Anchal Agarwal <anchalag@amazon.com>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Vineeth Remanan Pillai <vpillai@digitalocean.com>
>> Cc: Kelley Nielsen <kelleynnn@gmail.com>
>> Link: http://lkml.kernel.org/r/20200416180132.GB3352@xps-13
>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>> ---
>
> You forwarded on a backport without signing off on it yourself, sorry, I
> can't take this as-is. Please fix up and resend.
Duh... resending...
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-02-03 14:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-30 15:28 [PATH stable 5.4] mm: swap: properly update readahead statistics in unuse_pte_range() Luiz Capitulino
2023-02-03 9:19 ` Greg KH
2023-02-03 14:33 ` Luiz Capitulino
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).