linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org
Cc: "Andi Kleen" <ak@linux.intel.com>,
	"Aneesh Kumar" <aneesh.kumar@linux.ibm.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Hillf Danton" <hdanton@sina.com>, "Jens Axboe" <axboe@kernel.dk>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Mel Gorman" <mgorman@suse.de>,
	"Michael Larabel" <Michael@michaellarabel.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Tejun Heo" <tj@kernel.org>, "Vlastimil Babka" <vbabka@suse.cz>,
	"Will Deacon" <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	page-reclaim@google.com, "Yu Zhao" <yuzhao@google.com>,
	"Brian Geffon" <bgeffon@google.com>,
	"Jan Alexander Steffens" <heftig@archlinux.org>,
	"Oleksandr Natalenko" <oleksandr@natalenko.name>,
	"Steven Barrett" <steven@liquorix.net>,
	"Suleiman Souhlal" <suleiman@google.com>,
	"Daniel Byrne" <djbyrne@mtu.edu>,
	"Donald Carr" <d@chaos-reins.com>,
	"Holger Hoffstätte" <holger@applied-asynchrony.com>,
	"Konstantin Kharlamov" <Hi-Angel@yandex.ru>,
	"Shuang Zhai" <szhai2@cs.rochester.edu>,
	"Sofia Trinh" <sofia.trinh@edi.works>,
	"Vaibhav Jain" <vaibhav@linux.ibm.com>
Subject: [PATCH v11 09/14] mm: multi-gen LRU: optimize multiple memcgs
Date: Tue, 17 May 2022 19:46:28 -0600	[thread overview]
Message-ID: <20220518014632.922072-10-yuzhao@google.com> (raw)
In-Reply-To: <20220518014632.922072-1-yuzhao@google.com>

When multiple memcgs are available, it is possible to make better
choices based on generations and tiers and therefore improve the
overall performance under global memory pressure. This patch adds a
rudimentary optimization to select memcgs that can drop single-use
unmapped clean pages first. Doing so reduces the chance of going into
the aging path or swapping. These two operations can be costly.

A typical example that benefits from this optimization is a server
running mixed types of workloads, e.g., heavy anon workload in one
memcg and heavy buffered I/O workload in the other.

Though this optimization can be applied to both kswapd and direct
reclaim, it is only added to kswapd to keep the patchset manageable.
Later improvements will cover the direct reclaim path.

Server benchmark results:
  Mixed workloads:
    fio (buffered I/O): +[1, 3]%
                IOPS         BW
      patch1-8: 2154k        8415MiB/s
      patch1-9: 2205k        8613MiB/s

    memcached (anon): +[132, 136]%
                Ops/sec      KB/sec
      patch1-8: 819618.49    31838.48
      patch1-9: 1916516.06   74447.92

  Mixed workloads:
    fio (buffered I/O): +[59, 61]%
                IOPS         BW
      5.18-rc1: 1378k        5385MiB/s
      patch1-9: 2205k        8613MiB/s

    memcached (anon): +[229, 233]%
                Ops/sec      KB/sec
      5.18-rc1: 578946.00    22489.44
      patch1-9: 1916516.06   74447.92

  Configurations:
    (changes since patch 6)

    cat mixed.sh
    modprobe brd rd_nr=2 rd_size=56623104

    swapoff -a
    mkswap /dev/ram0
    swapon /dev/ram0

    mkfs.ext4 /dev/ram1
    mount -t ext4 /dev/ram1 /mnt

    memtier_benchmark -S /var/run/memcached/memcached.sock \
      -P memcache_binary -n allkeys --key-minimum=1 \
      --key-maximum=50000000 --key-pattern=P:P -c 1 -t 36 \
      --ratio 1:0 --pipeline 8 -d 2000

    fio -name=mglru --numjobs=36 --directory=/mnt --size=1408m \
      --buffered=1 --ioengine=io_uring --iodepth=128 \
      --iodepth_batch_submit=32 --iodepth_batch_complete=32 \
      --rw=randread --random_distribution=random --norandommap \
      --time_based --ramp_time=10m --runtime=90m --group_reporting &
    pid=$!

    sleep 200

    memtier_benchmark -S /var/run/memcached/memcached.sock \
      -P memcache_binary -n allkeys --key-minimum=1 \
      --key-maximum=50000000 --key-pattern=R:R -c 1 -t 36 \
      --ratio 0:1 --pipeline 8 --randomize --distinct-client-seed

    kill -INT $pid
    wait

Client benchmark results:
  no change (CONFIG_MEMCG=n)

Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
 mm/vmscan.c | 45 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index f292e7e761b1..a7e768675707 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -128,6 +128,13 @@ struct scan_control {
 	/* Always discard instead of demoting to lower tier memory */
 	unsigned int no_demotion:1;
 
+#ifdef CONFIG_LRU_GEN
+	/* help make better choices when multiple memcgs are available */
+	unsigned int memcgs_need_aging:1;
+	unsigned int memcgs_need_swapping:1;
+	unsigned int memcgs_avoid_swapping:1;
+#endif
+
 	/* Allocation order */
 	s8 order;
 
@@ -4290,6 +4297,22 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc)
 
 	VM_WARN_ON_ONCE(!current_is_kswapd());
 
+	/*
+	 * To reduce the chance of going into the aging path or swapping, which
+	 * can be costly, optimistically skip them unless their corresponding
+	 * flags were cleared in the eviction path. This improves the overall
+	 * performance when multiple memcgs are available.
+	 */
+	if (!sc->memcgs_need_aging) {
+		sc->memcgs_need_aging = true;
+		sc->memcgs_avoid_swapping = !sc->memcgs_need_swapping;
+		sc->memcgs_need_swapping = true;
+		return;
+	}
+
+	sc->memcgs_need_swapping = true;
+	sc->memcgs_avoid_swapping = true;
+
 	current->reclaim_state->mm_walk = &pgdat->mm_walk;
 
 	memcg = mem_cgroup_iter(NULL, NULL, NULL);
@@ -4699,7 +4722,8 @@ static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int sw
 	return scanned;
 }
 
-static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness)
+static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness,
+			bool *swapped)
 {
 	int type;
 	int scanned;
@@ -4765,6 +4789,9 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 
 	sc->nr_reclaimed += reclaimed;
 
+	if (type == LRU_GEN_ANON && swapped)
+		*swapped = true;
+
 	return scanned;
 }
 
@@ -4793,8 +4820,10 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool
 	if (!nr_to_scan)
 		return 0;
 
-	if (!need_aging)
+	if (!need_aging) {
+		sc->memcgs_need_aging = false;
 		return nr_to_scan;
+	}
 
 	/* leave the work to lru_gen_age_node() */
 	if (current_is_kswapd())
@@ -4816,6 +4845,8 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 {
 	struct blk_plug plug;
 	long scanned = 0;
+	bool swapped = false;
+	unsigned long reclaimed = sc->nr_reclaimed;
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 
 	lru_add_drain();
@@ -4841,13 +4872,19 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 		if (!nr_to_scan)
 			break;
 
-		delta = evict_folios(lruvec, sc, swappiness);
+		delta = evict_folios(lruvec, sc, swappiness, &swapped);
 		if (!delta)
 			break;
 
+		if (sc->memcgs_avoid_swapping && swappiness < 200 && swapped)
+			break;
+
 		scanned += delta;
-		if (scanned >= nr_to_scan)
+		if (scanned >= nr_to_scan) {
+			if (!swapped && sc->nr_reclaimed - reclaimed >= MIN_LRU_BATCH)
+				sc->memcgs_need_swapping = false;
 			break;
+		}
 
 		cond_resched();
 	}
-- 
2.36.0.550.gb090851708-goog


  parent reply	other threads:[~2022-05-18  1:47 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-18  1:46 [PATCH v11 00/14] Multi-Gen LRU Framework Yu Zhao
2022-05-18  1:46 ` [PATCH v11 01/14] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao
2022-05-18  1:46 ` [PATCH v11 02/14] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao
2022-05-18  1:46 ` [PATCH v11 03/14] mm/vmscan.c: refactor shrink_node() Yu Zhao
2022-05-18  1:46 ` [PATCH v11 04/14] Revert "include/linux/mm_inline.h: fold __update_lru_size() into its sole caller" Yu Zhao
2022-05-18  1:46 ` [PATCH v11 05/14] mm: multi-gen LRU: groundwork Yu Zhao
2022-06-09  5:33   ` zhong jiang
2022-05-18  1:46 ` [PATCH v11 06/14] mm: multi-gen LRU: minimal implementation Yu Zhao
2022-06-09 12:34   ` zhong jiang
2022-06-09 14:46     ` zhong jiang
2022-05-18  1:46 ` [PATCH v11 07/14] mm: multi-gen LRU: exploit locality in rmap Yu Zhao
2022-06-06  9:25   ` Barry Song
2022-06-07  7:37     ` Barry Song
2022-06-07 10:21       ` Will Deacon
2022-06-06 22:37         ` Barry Song
2022-06-07 10:43           ` Will Deacon
2022-06-07 21:06             ` Yu Zhao
2022-06-08  0:43               ` Barry Song
2022-06-08 15:51                 ` Linus Torvalds
2022-06-08 22:45                   ` Barry Song
2022-06-16 21:55                     ` Yu Zhao
2022-06-16 22:33                       ` Barry Song
2022-06-16 23:29                         ` Yu Zhao
2022-06-17  1:42                           ` Yu Zhao
2022-06-17  2:01                             ` Barry Song
2022-06-17  3:03                               ` Yu Zhao
2022-06-17  3:17                                 ` Yu Zhao
2022-06-19 20:36                                   ` Yu Zhao
2022-06-19 21:56                                     ` Barry Song
2022-06-07 19:07       ` Yu Zhao
2022-06-08  7:48         ` Barry Song
2022-06-07 18:58     ` Yu Zhao
2022-05-18  1:46 ` [PATCH v11 08/14] mm: multi-gen LRU: support page table walks Yu Zhao
2022-05-18  1:46 ` Yu Zhao [this message]
2022-05-18  1:46 ` [PATCH v11 10/14] mm: multi-gen LRU: kill switch Yu Zhao
2022-05-18  1:46 ` [PATCH v11 11/14] mm: multi-gen LRU: thrashing prevention Yu Zhao
2022-05-18  1:46 ` [PATCH v11 12/14] mm: multi-gen LRU: debugfs interface Yu Zhao
2022-05-18  1:46 ` [PATCH v11 13/14] mm: multi-gen LRU: admin guide Yu Zhao
2022-05-18  1:46 ` [PATCH v11 14/14] mm: multi-gen LRU: design doc Yu Zhao
2022-05-18  2:05 ` [PATCH v11 00/14] Multi-Gen LRU Framework Jens Axboe
2022-06-07 22:47   ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220518014632.922072-10-yuzhao@google.com \
    --to=yuzhao@google.com \
    --cc=Hi-Angel@yandex.ru \
    --cc=Michael@michaellarabel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=bgeffon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=d@chaos-reins.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=djbyrne@mtu.edu \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=heftig@archlinux.org \
    --cc=holger@applied-asynchrony.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=oleksandr@natalenko.name \
    --cc=page-reclaim@google.com \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    --cc=sofia.trinh@edi.works \
    --cc=steven@liquorix.net \
    --cc=suleiman@google.com \
    --cc=szhai2@cs.rochester.edu \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vaibhav@linux.ibm.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).