[PATCH v2] mm: swap: use fixed-size readahead during swapoff

From: Andrea Righi <andrea.righi@canonical.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>,
	Minchan Kim <minchan@kernel.org>,
	Anchal Agarwal <anchalag@amazon.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH v2] mm: swap: use fixed-size readahead during swapoff
Date: Mon, 13 Apr 2020 13:18:10 +0200	[thread overview]
Message-ID: <20200413111810.GA801367@xps-13> (raw)

The global swap-in readahead policy takes in account the previous access
patterns, using a scaling heuristic to determine the optimal readahead
chunk dynamically.

This works pretty well in most cases, but like any heuristic there are
specific cases when this approach is not ideal, for example the swapoff
scenario.

During swapoff we just want to load back into memory all the swapped-out
pages and for this specific use case a fixed-size readahead is more
efficient.

The specific use case this patch is addressing is to improve swapoff
performance when a VM has been hibernated, resumed and all memory needs
to be forced back to RAM by disabling swap (see the test case below).

But it is not the only case where a fixed-size readahead can show its
benefits. More in general, the fixed-size approach can be beneficial in
all the cases when a task that is using a large part of swapped out
pages needs to load them back to memory as fast as possible.

Testing environment
===================

 - Host:
   CPU: 1.8GHz Intel Core i7-8565U (quad-core, 8MB cache)
   HDD: PC401 NVMe SK hynix 512GB
   MEM: 16GB

 - Guest (kvm):
   8GB of RAM
   virtio block driver
   16GB swap file on ext4 (/swapfile)

Test case
=========
 - allocate 85% of memory
 - `systemctl hibernate` to force all the pages to be swapped-out to the
   swap file
 - resume the system
 - measure the time that swapoff takes to complete:
   # /usr/bin/time swapoff /swapfile

Result
======
                5.6 vanilla   5.6 w/ this patch
                -----------   -----------------
page-cluster=1       26.77s              21.25s
page-cluster=2       28.29s              12.66s
page-cluster=3       22.09s               8.77s
page-cluster=4       21.50s               7.60s
page-cluster=5       25.35s               7.75s
page-cluster=6       23.19s               8.32s
page-cluster=7       22.25s               9.40s
page-cluster=8       22.09s               8.93s

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
---
Changes in v2:
 - avoid introducing a new ABI to select the fixed-size readahead

NOTE: after running some tests with this new patch I don't see any
difference in terms of performance, so I'm reporting the same test
result of the previous version.

 mm/swap_state.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index ebed37bbf7a3..c71abc8df304 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -20,6 +20,7 @@
 #include <linux/migrate.h>
 #include <linux/vmalloc.h>
 #include <linux/swap_slots.h>
+#include <linux/oom.h>
 #include <linux/huge_mm.h>
 
 #include <asm/pgtable.h>
@@ -507,6 +508,14 @@ static unsigned long swapin_nr_pages(unsigned long offset)
 	max_pages = 1 << READ_ONCE(page_cluster);
 	if (max_pages <= 1)
 		return 1;
+	/*
+	 * If current task is using too much memory or swapoff is running
+	 * simply use the max readahead size. Since we likely want to load a
+	 * lot of pages back into memory, using a fixed-size max readhaead can
+	 * give better performance in this case.
+	 */
+	if (oom_task_origin(current))
+		return max_pages;
 
 	hits = atomic_xchg(&swapin_readahead_hits, 0);
 	pages = __swapin_nr_pages(prev_offset, offset, hits, max_pages,
-- 
2.25.1