linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Kuo-Hsin Yang <vovoy@chromium.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.com>, Sonny Rao <sonnyrao@chromium.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Rik van Riel <riel@redhat.com>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Minchan Kim <minchan@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.19 50/50] mm: vmscan: scan anonymous pages on file refaults
Date: Fri, 26 Jul 2019 17:25:25 +0200	[thread overview]
Message-ID: <20190726152306.041848000@linuxfoundation.org> (raw)
In-Reply-To: <20190726152300.760439618@linuxfoundation.org>

From: Kuo-Hsin Yang <vovoy@chromium.org>

commit 2c012a4ad1a2cd3fb5a0f9307b9d219f84eda1fa upstream.

When file refaults are detected and there are many inactive file pages,
the system never reclaim anonymous pages, the file pages are dropped
aggressively when there are still a lot of cold anonymous pages and
system thrashes.  This issue impacts the performance of applications
with large executable, e.g.  chrome.

With this patch, when file refault is detected, inactive_list_is_low()
always returns true for file pages in get_scan_count() to enable
scanning anonymous pages.

The problem can be reproduced by the following test program.

---8<---
void fallocate_file(const char *filename, off_t size)
{
	struct stat st;
	int fd;

	if (!stat(filename, &st) && st.st_size >= size)
		return;

	fd = open(filename, O_WRONLY | O_CREAT, 0600);
	if (fd < 0) {
		perror("create file");
		exit(1);
	}
	if (posix_fallocate(fd, 0, size)) {
		perror("fallocate");
		exit(1);
	}
	close(fd);
}

long *alloc_anon(long size)
{
	long *start = malloc(size);
	memset(start, 1, size);
	return start;
}

long access_file(const char *filename, long size, long rounds)
{
	int fd, i;
	volatile char *start1, *end1, *start2;
	const int page_size = getpagesize();
	long sum = 0;

	fd = open(filename, O_RDONLY);
	if (fd == -1) {
		perror("open");
		exit(1);
	}

	/*
	 * Some applications, e.g. chrome, use a lot of executable file
	 * pages, map some of the pages with PROT_EXEC flag to simulate
	 * the behavior.
	 */
	start1 = mmap(NULL, size / 2, PROT_READ | PROT_EXEC, MAP_SHARED,
		      fd, 0);
	if (start1 == MAP_FAILED) {
		perror("mmap");
		exit(1);
	}
	end1 = start1 + size / 2;

	start2 = mmap(NULL, size / 2, PROT_READ, MAP_SHARED, fd, size / 2);
	if (start2 == MAP_FAILED) {
		perror("mmap");
		exit(1);
	}

	for (i = 0; i < rounds; ++i) {
		struct timeval before, after;
		volatile char *ptr1 = start1, *ptr2 = start2;
		gettimeofday(&before, NULL);
		for (; ptr1 < end1; ptr1 += page_size, ptr2 += page_size)
			sum += *ptr1 + *ptr2;
		gettimeofday(&after, NULL);
		printf("File access time, round %d: %f (sec)
", i,
		       (after.tv_sec - before.tv_sec) +
		       (after.tv_usec - before.tv_usec) / 1000000.0);
	}
	return sum;
}

int main(int argc, char *argv[])
{
	const long MB = 1024 * 1024;
	long anon_mb, file_mb, file_rounds;
	const char filename[] = "large";
	long *ret1;
	long ret2;

	if (argc != 4) {
		printf("usage: thrash ANON_MB FILE_MB FILE_ROUNDS
");
		exit(0);
	}
	anon_mb = atoi(argv[1]);
	file_mb = atoi(argv[2]);
	file_rounds = atoi(argv[3]);

	fallocate_file(filename, file_mb * MB);
	printf("Allocate %ld MB anonymous pages
", anon_mb);
	ret1 = alloc_anon(anon_mb * MB);
	printf("Access %ld MB file pages
", file_mb);
	ret2 = access_file(filename, file_mb * MB, file_rounds);
	printf("Print result to prevent optimization: %ld
",
	       *ret1 + ret2);
	return 0;
}
---8<---

Running the test program on 2GB RAM VM with kernel 5.2.0-rc5, the program
fills ram with 2048 MB memory, access a 200 MB file for 10 times.  Without
this patch, the file cache is dropped aggresively and every access to the
file is from disk.

  $ ./thrash 2048 200 10
  Allocate 2048 MB anonymous pages
  Access 200 MB file pages
  File access time, round 0: 2.489316 (sec)
  File access time, round 1: 2.581277 (sec)
  File access time, round 2: 2.487624 (sec)
  File access time, round 3: 2.449100 (sec)
  File access time, round 4: 2.420423 (sec)
  File access time, round 5: 2.343411 (sec)
  File access time, round 6: 2.454833 (sec)
  File access time, round 7: 2.483398 (sec)
  File access time, round 8: 2.572701 (sec)
  File access time, round 9: 2.493014 (sec)

With this patch, these file pages can be cached.

  $ ./thrash 2048 200 10
  Allocate 2048 MB anonymous pages
  Access 200 MB file pages
  File access time, round 0: 2.475189 (sec)
  File access time, round 1: 2.440777 (sec)
  File access time, round 2: 2.411671 (sec)
  File access time, round 3: 1.955267 (sec)
  File access time, round 4: 0.029924 (sec)
  File access time, round 5: 0.000808 (sec)
  File access time, round 6: 0.000771 (sec)
  File access time, round 7: 0.000746 (sec)
  File access time, round 8: 0.000738 (sec)
  File access time, round 9: 0.000747 (sec)

Checked the swap out stats during the test [1], 19006 pages swapped out
with this patch, 3418 pages swapped out without this patch. There are
more swap out, but I think it's within reasonable range when file backed
data set doesn't fit into the memory.

$ ./thrash 2000 100 2100 5 1 # ANON_MB FILE_EXEC FILE_NOEXEC ROUNDS
PROCESSES Allocate 2000 MB anonymous pages active_anon: 1613644,
inactive_anon: 348656, active_file: 892, inactive_file: 1384 (kB)
pswpout: 7972443, pgpgin: 478615246 Access 100 MB executable file pages
Access 2100 MB regular file pages File access time, round 0: 12.165,
(sec) active_anon: 1433788, inactive_anon: 478116, active_file: 17896,
inactive_file: 24328 (kB) File access time, round 1: 11.493, (sec)
active_anon: 1430576, inactive_anon: 477144, active_file: 25440,
inactive_file: 26172 (kB) File access time, round 2: 11.455, (sec)
active_anon: 1427436, inactive_anon: 476060, active_file: 21112,
inactive_file: 28808 (kB) File access time, round 3: 11.454, (sec)
active_anon: 1420444, inactive_anon: 473632, active_file: 23216,
inactive_file: 35036 (kB) File access time, round 4: 11.479, (sec)
active_anon: 1413964, inactive_anon: 471460, active_file: 31728,
inactive_file: 32224 (kB) pswpout: 7991449 (+ 19006), pgpgin: 489924366
(+ 11309120)

With 4 processes accessing non-overlapping parts of a large file, 30316
pages swapped out with this patch, 5152 pages swapped out without this
patch.  The swapout number is small comparing to pgpgin.

[1]: https://github.com/vovo/testing/blob/master/mem_thrash.c

Link: http://lkml.kernel.org/r/20190701081038.GA83398@google.com
Fixes: e9868505987a ("mm,vmscan: only evict file pages when we have plenty")
Fixes: 7c5bd705d8f9 ("mm: memcg: only evict file pages when we have plenty")
Signed-off-by: Kuo-Hsin Yang <vovoy@chromium.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Sonny Rao <sonnyrao@chromium.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>	[4.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backported to 4.14.y, 4.19.y, 5.1.y: adjust context]
Signed-off-by: Kuo-Hsin Yang <vovoy@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/vmscan.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2190,7 +2190,7 @@ static void shrink_active_list(unsigned
  *   10TB     320        32GB
  */
 static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
-				 struct scan_control *sc, bool actual_reclaim)
+				 struct scan_control *sc, bool trace)
 {
 	enum lru_list active_lru = file * LRU_FILE + LRU_ACTIVE;
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
@@ -2216,7 +2216,7 @@ static bool inactive_list_is_low(struct
 	 * rid of the stale workingset quickly.
 	 */
 	refaults = lruvec_page_state(lruvec, WORKINGSET_ACTIVATE);
-	if (file && actual_reclaim && lruvec->refaults != refaults) {
+	if (file && lruvec->refaults != refaults) {
 		inactive_ratio = 0;
 	} else {
 		gb = (inactive + active) >> (30 - PAGE_SHIFT);
@@ -2226,7 +2226,7 @@ static bool inactive_list_is_low(struct
 			inactive_ratio = 1;
 	}
 
-	if (actual_reclaim)
+	if (trace)
 		trace_mm_vmscan_inactive_list_is_low(pgdat->node_id, sc->reclaim_idx,
 			lruvec_lru_size(lruvec, inactive_lru, MAX_NR_ZONES), inactive,
 			lruvec_lru_size(lruvec, active_lru, MAX_NR_ZONES), active,



  parent reply	other threads:[~2019-07-26 15:34 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-26 15:24 [PATCH 4.19 00/50] 4.19.62-stable review Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 01/50] bnx2x: Prevent load reordering in tx completion processing Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 02/50] caif-hsi: fix possible deadlock in cfhsi_exit_module() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 03/50] hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 04/50] igmp: fix memory leak in igmpv3_del_delrec() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 05/50] ipv4: dont set IPv6 only flags to IPv4 addresses Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 06/50] ipv6: rt6_check should return NULL if from is NULL Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 07/50] ipv6: Unlink sibling route in case of failure Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 08/50] net: bcmgenet: use promisc for unsupported filters Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 09/50] net: dsa: mv88e6xxx: wait after reset deactivation Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 10/50] net: make skb_dst_force return true when dst is refcounted Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 11/50] net: neigh: fix multiple neigh timer scheduling Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 12/50] net: openvswitch: fix csum updates for MPLS actions Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 13/50] net: phy: sfp: hwmon: Fix scaling of RX power Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 14/50] net: stmmac: Re-work the queue selection for TSO packets Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 15/50] nfc: fix potential illegal memory access Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 16/50] r8169: fix issue with confused RX unit after PHY power-down on RTL8411b Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 17/50] rxrpc: Fix send on a connected, but unbound socket Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 18/50] sctp: fix error handling on stream scheduler initialization Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 19/50] sky2: Disable MSI on ASUS P6T Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 20/50] tcp: be more careful in tcp_fragment() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 21/50] tcp: fix tcp_set_congestion_control() use from bpf hook Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 22/50] tcp: Reset bytes_acked and bytes_received when disconnecting Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 23/50] vrf: make sure skb->data contains ip header to make routing Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 4.19 24/50] net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 25/50] macsec: fix use-after-free of skb during RX Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 26/50] macsec: fix checksumming after decryption Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 27/50] netrom: fix a memory leak in nr_rx_frame() Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 28/50] netrom: hold sock when setting skb->destructor Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 29/50] net_sched: unset TCQ_F_CAN_BYPASS when adding filters Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 30/50] net/tls: make sure offload also gets the keys wiped Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 31/50] sctp: not bind the socket in sctp_connect Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 32/50] net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 33/50] net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 34/50] net: bridge: dont cache ether dest pointer on input Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 35/50] net: bridge: stp: dont cache eth dest pointer before skb pull Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 36/50] dma-buf: balance refcount inbalance Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 37/50] dma-buf: Discard old fence_excl on retrying get_fences_rcu for realloc Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 38/50] gpio: davinci: silence error prints in case of EPROBE_DEFER Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 39/50] MIPS: lb60: Fix pin mappings Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 40/50] perf/core: Fix exclusive events grouping Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 41/50] perf/core: Fix race between close() and fork() Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 42/50] ext4: dont allow any modifications to an immutable file Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 43/50] ext4: enforce the immutable flag on open files Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 44/50] mm: add filemap_fdatawait_range_keep_errors() Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 45/50] jbd2: introduce jbd2_inode dirty range scoping Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 46/50] ext4: use " Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 47/50] ext4: allow directory holes Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 48/50] KVM: nVMX: do not use dangling shadow VMCS after guest reset Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 4.19 49/50] KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested Greg Kroah-Hartman
2019-07-26 15:25 ` Greg Kroah-Hartman [this message]
2019-07-27  2:14 ` [PATCH 4.19 00/50] 4.19.62-stable review kernelci.org bot
2019-07-27  2:35 ` shuah
2019-07-27  4:40 ` Naresh Kamboju
2019-07-27 16:06 ` Guenter Roeck
2019-07-29  9:02 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190726152306.041848000@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=sonnyrao@chromium.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=vovoy@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).