All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU
@ 2022-12-01 22:39 Yu Zhao
  2022-12-01 22:39 ` [PATCH mm-unstable v1 1/8] mm: multi-gen LRU: rename lru_gen_struct to lru_gen_folio Yu Zhao
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Yu Zhao @ 2022-12-01 22:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Jonathan Corbet, Michael Larabel, Michal Hocko,
	Mike Rapoport, Roman Gushchin, Suren Baghdasaryan, linux-mm,
	linux-kernel, linux-mm, Yu Zhao

An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
since each node and memcg combination has an LRU of folios (see
mem_cgroup_lruvec()).

Its goal is to improve the scalability of global reclaim, which is
critical to systemwide memory overcommit in data centers. Note that
memcg reclaim is currently out of scope.

Its memory bloat is a pointer to each LRU vector and negligible to
each node. In terms of traversing memcgs during global reclaim, it
improves the best-case complexity from O(n) to O(1) and does not
affect the worst-case complexity O(n). Therefore, on average, it has
a sublinear complexity in contrast to the current linear complexity.

The basic structure of an memcg LRU can be understood by an analogy
to the active/inactive LRU (of folios):
1. It has the young and the old (generations);
2. Its linked lists have the head and the tail;
3. The increment of max_seq triggers promotion;
4. Other events, e.g., offlining an memcg, triggers similar
   operations.

In terms of global reclaim, it has two distinct features:
1. Sharding, which allows each thread to start at a random memcg (in
   the old generation) and improves parallelism;
2. Eventual fairness, which allows direct reclaim to bail out and
   reduces latency without affecting fairness over some time.

The commit message in patch 6 details the workflow:
https://lore.kernel.org/r/20221201223923.873696-7-yuzhao@google.com/

The following is a simple test to quickly verify its effectiveness.
More benchmarks are coming soon.

  Test design:
  1. Create multiple memcgs.
  2. Each memcg contains a job (fio).
  3. All jobs access the same amount of memory randomly.
  4. The system does not experience global memory pressure.
  5. Periodically write to the root memory.reclaim.

  Desired outcome:
  1. All memcgs have similar pgsteal, i.e.,
     stddev(pgsteal)/mean(pgsteal) is close to 0%.
  2. The total pgsteal is close to the total requested through
     memory.reclaim, i.e., sum(pgsteal)/sum(requested) is close to
     100%.

  Actual outcome [1]:
             stddev(pgsteal)/mean(pgsteal) sum(pgsteal)/sum(requested)
  MGLRU off  75%                           425%
  MGLRU on   20%                           95%

  ####################################################################
  MEMCGS=128

  for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
      mkdir /sys/fs/cgroup/memcg$memcg
  done

  start() {
      echo $BASHPID > /sys/fs/cgroup/memcg$memcg/cgroup.procs

      fio -name=memcg$memcg --numjobs=1 --ioengine=mmap \
          --filename=/dev/zero --size=1920M --rw=randrw \
          --rate=64m,64m --random_distribution=random \
          --fadvise_hint=0 --time_based --runtime=10h \
          --group_reporting --minimal
  }

  for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
      start &
  done

  sleep 600

  for ((i = 0; i < 600; i++)); do
      echo 256m >/sys/fs/cgroup/memory.reclaim
      sleep 6
  done

  for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
      grep "pgsteal " /sys/fs/cgroup/memcg$memcg/memory.stat
  done
  ####################################################################

[1]: This was obtained from running the above script (touches less
     than 256GB memory) on an EPYC 7B13 with 512GB DRAM for over an
     hour.

Yu Zhao (8):
  mm: multi-gen LRU: rename lru_gen_struct to lru_gen_folio
  mm: multi-gen LRU: rename lrugen->lists[] to lrugen->folios[]
  mm: multi-gen LRU: remove eviction fairness safeguard
  mm: multi-gen LRU: remove aging fairness safeguard
  mm: multi-gen LRU: shuffle should_run_aging()
  mm: multi-gen LRU: per-node lru_gen_folio lists
  mm: multi-gen LRU: clarify scan_control flags
  mm: multi-gen LRU: simplify arch_has_hw_pte_young() check

 Documentation/mm/multigen_lru.rst |   8 +-
 include/linux/memcontrol.h        |  10 +
 include/linux/mm_inline.h         |  25 +-
 include/linux/mmzone.h            | 127 ++++-
 mm/memcontrol.c                   |  16 +
 mm/page_alloc.c                   |   1 +
 mm/vmscan.c                       | 765 ++++++++++++++++++++----------
 mm/workingset.c                   |   4 +-
 8 files changed, 687 insertions(+), 269 deletions(-)

-- 
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH mm-unstable v1 1/8] mm: multi-gen LRU: rename lru_gen_struct to lru_gen_folio
  2022-12-01 22:39 [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU Yu Zhao
@ 2022-12-01 22:39 ` Yu Zhao
  2022-12-01 22:39 ` [PATCH mm-unstable v1 2/8] mm: multi-gen LRU: rename lrugen->lists[] to lrugen->folios[] Yu Zhao
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Yu Zhao @ 2022-12-01 22:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Jonathan Corbet, Michael Larabel, Michal Hocko,
	Mike Rapoport, Roman Gushchin, Suren Baghdasaryan, linux-mm,
	linux-kernel, linux-mm, Yu Zhao

The new name lru_gen_folio will be more distinct from the coming
lru_gen_memcg.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 include/linux/mm_inline.h |  4 ++--
 include/linux/mmzone.h    |  6 +++---
 mm/vmscan.c               | 34 +++++++++++++++++-----------------
 mm/workingset.c           |  4 ++--
 4 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index e8ed225d8f7c..f63968bd7de5 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -178,7 +178,7 @@ static inline void lru_gen_update_size(struct lruvec *lruvec, struct folio *foli
 	int zone = folio_zonenum(folio);
 	int delta = folio_nr_pages(folio);
 	enum lru_list lru = type * LRU_INACTIVE_FILE;
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 
 	VM_WARN_ON_ONCE(old_gen != -1 && old_gen >= MAX_NR_GENS);
 	VM_WARN_ON_ONCE(new_gen != -1 && new_gen >= MAX_NR_GENS);
@@ -224,7 +224,7 @@ static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio,
 	int gen = folio_lru_gen(folio);
 	int type = folio_is_file_lru(folio);
 	int zone = folio_zonenum(folio);
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 
 	VM_WARN_ON_ONCE_FOLIO(gen != -1, folio);
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 5f74891556f3..bd3e4689f72d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -404,7 +404,7 @@ enum {
  * The number of pages in each generation is eventually consistent and therefore
  * can be transiently negative when reset_batch_size() is pending.
  */
-struct lru_gen_struct {
+struct lru_gen_folio {
 	/* the aging increments the youngest generation number */
 	unsigned long max_seq;
 	/* the eviction increments the oldest generation numbers */
@@ -461,7 +461,7 @@ struct lru_gen_mm_state {
 struct lru_gen_mm_walk {
 	/* the lruvec under reclaim */
 	struct lruvec *lruvec;
-	/* unstable max_seq from lru_gen_struct */
+	/* unstable max_seq from lru_gen_folio */
 	unsigned long max_seq;
 	/* the next address within an mm to scan */
 	unsigned long next_addr;
@@ -524,7 +524,7 @@ struct lruvec {
 	unsigned long			flags;
 #ifdef CONFIG_LRU_GEN
 	/* evictable pages divided into generations */
-	struct lru_gen_struct		lrugen;
+	struct lru_gen_folio		lrugen;
 	/* to concurrently iterate lru_gen_mm_list */
 	struct lru_gen_mm_state		mm_state;
 #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9356a3ee639c..fcb4ac351f93 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3197,7 +3197,7 @@ static int get_nr_gens(struct lruvec *lruvec, int type)
 
 static bool __maybe_unused seq_is_valid(struct lruvec *lruvec)
 {
-	/* see the comment on lru_gen_struct */
+	/* see the comment on lru_gen_folio */
 	return get_nr_gens(lruvec, LRU_GEN_FILE) >= MIN_NR_GENS &&
 	       get_nr_gens(lruvec, LRU_GEN_FILE) <= get_nr_gens(lruvec, LRU_GEN_ANON) &&
 	       get_nr_gens(lruvec, LRU_GEN_ANON) <= MAX_NR_GENS;
@@ -3594,7 +3594,7 @@ struct ctrl_pos {
 static void read_ctrl_pos(struct lruvec *lruvec, int type, int tier, int gain,
 			  struct ctrl_pos *pos)
 {
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	int hist = lru_hist_from_seq(lrugen->min_seq[type]);
 
 	pos->refaulted = lrugen->avg_refaulted[type][tier] +
@@ -3609,7 +3609,7 @@ static void read_ctrl_pos(struct lruvec *lruvec, int type, int tier, int gain,
 static void reset_ctrl_pos(struct lruvec *lruvec, int type, bool carryover)
 {
 	int hist, tier;
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	bool clear = carryover ? NR_HIST_GENS == 1 : NR_HIST_GENS > 1;
 	unsigned long seq = carryover ? lrugen->min_seq[type] : lrugen->max_seq + 1;
 
@@ -3686,7 +3686,7 @@ static int folio_update_gen(struct folio *folio, int gen)
 static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
 {
 	int type = folio_is_file_lru(folio);
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
 	unsigned long new_flags, old_flags = READ_ONCE(folio->flags);
 
@@ -3731,7 +3731,7 @@ static void update_batch_size(struct lru_gen_mm_walk *walk, struct folio *folio,
 static void reset_batch_size(struct lruvec *lruvec, struct lru_gen_mm_walk *walk)
 {
 	int gen, type, zone;
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 
 	walk->batched = 0;
 
@@ -4248,7 +4248,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap)
 {
 	int zone;
 	int remaining = MAX_LRU_BATCH;
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
 
 	if (type == LRU_GEN_ANON && !can_swap)
@@ -4284,7 +4284,7 @@ static bool try_to_inc_min_seq(struct lruvec *lruvec, bool can_swap)
 {
 	int gen, type, zone;
 	bool success = false;
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	DEFINE_MIN_SEQ(lruvec);
 
 	VM_WARN_ON_ONCE(!seq_is_valid(lruvec));
@@ -4305,7 +4305,7 @@ static bool try_to_inc_min_seq(struct lruvec *lruvec, bool can_swap)
 		;
 	}
 
-	/* see the comment on lru_gen_struct */
+	/* see the comment on lru_gen_folio */
 	if (can_swap) {
 		min_seq[LRU_GEN_ANON] = min(min_seq[LRU_GEN_ANON], min_seq[LRU_GEN_FILE]);
 		min_seq[LRU_GEN_FILE] = max(min_seq[LRU_GEN_ANON], lrugen->min_seq[LRU_GEN_FILE]);
@@ -4327,7 +4327,7 @@ static void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
 {
 	int prev, next;
 	int type, zone;
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 
 	spin_lock_irq(&lruvec->lru_lock);
 
@@ -4385,7 +4385,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
 	bool success;
 	struct lru_gen_mm_walk *walk;
 	struct mm_struct *mm = NULL;
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 
 	VM_WARN_ON_ONCE(max_seq > READ_ONCE(lrugen->max_seq));
 
@@ -4450,7 +4450,7 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, unsig
 	unsigned long old = 0;
 	unsigned long young = 0;
 	unsigned long total = 0;
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 
 	for (type = !can_swap; type < ANON_AND_FILE; type++) {
@@ -4735,7 +4735,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, int tier_idx)
 	int delta = folio_nr_pages(folio);
 	int refs = folio_lru_refs(folio);
 	int tier = lru_tier_from_refs(refs);
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 
 	VM_WARN_ON_ONCE_FOLIO(gen >= MAX_NR_GENS, folio);
 
@@ -4835,7 +4835,7 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
 	int scanned = 0;
 	int isolated = 0;
 	int remaining = MAX_LRU_BATCH;
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 
 	VM_WARN_ON_ONCE(!list_empty(list));
@@ -5235,7 +5235,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 
 static bool __maybe_unused state_is_valid(struct lruvec *lruvec)
 {
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 
 	if (lrugen->enabled) {
 		enum lru_list lru;
@@ -5514,7 +5514,7 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
 	int i;
 	int type, tier;
 	int hist = lru_hist_from_seq(seq);
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 
 	for (tier = 0; tier < MAX_NR_TIERS; tier++) {
 		seq_printf(m, "            %10d", tier);
@@ -5564,7 +5564,7 @@ static int lru_gen_seq_show(struct seq_file *m, void *v)
 	unsigned long seq;
 	bool full = !debugfs_real_fops(m->file)->write;
 	struct lruvec *lruvec = v;
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	int nid = lruvec_pgdat(lruvec)->node_id;
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 	DEFINE_MAX_SEQ(lruvec);
@@ -5818,7 +5818,7 @@ void lru_gen_init_lruvec(struct lruvec *lruvec)
 {
 	int i;
 	int gen, type, zone;
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 
 	lrugen->max_seq = MIN_NR_GENS + 1;
 	lrugen->enabled = lru_gen_enabled();
diff --git a/mm/workingset.c b/mm/workingset.c
index 1a86645b7b3c..fd666584515c 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -223,7 +223,7 @@ static void *lru_gen_eviction(struct folio *folio)
 	unsigned long token;
 	unsigned long min_seq;
 	struct lruvec *lruvec;
-	struct lru_gen_struct *lrugen;
+	struct lru_gen_folio *lrugen;
 	int type = folio_is_file_lru(folio);
 	int delta = folio_nr_pages(folio);
 	int refs = folio_lru_refs(folio);
@@ -252,7 +252,7 @@ static void lru_gen_refault(struct folio *folio, void *shadow)
 	unsigned long token;
 	unsigned long min_seq;
 	struct lruvec *lruvec;
-	struct lru_gen_struct *lrugen;
+	struct lru_gen_folio *lrugen;
 	struct mem_cgroup *memcg;
 	struct pglist_data *pgdat;
 	int type = folio_is_file_lru(folio);
-- 
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH mm-unstable v1 2/8] mm: multi-gen LRU: rename lrugen->lists[] to lrugen->folios[]
  2022-12-01 22:39 [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU Yu Zhao
  2022-12-01 22:39 ` [PATCH mm-unstable v1 1/8] mm: multi-gen LRU: rename lru_gen_struct to lru_gen_folio Yu Zhao
@ 2022-12-01 22:39 ` Yu Zhao
  2022-12-01 22:39 ` [PATCH mm-unstable v1 3/8] mm: multi-gen LRU: remove eviction fairness safeguard Yu Zhao
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Yu Zhao @ 2022-12-01 22:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Jonathan Corbet, Michael Larabel, Michal Hocko,
	Mike Rapoport, Roman Gushchin, Suren Baghdasaryan, linux-mm,
	linux-kernel, linux-mm, Yu Zhao

lru_gen_folio will be chained into per-node lists by the coming
lrugen->list.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 Documentation/mm/multigen_lru.rst |  8 ++++----
 include/linux/mm_inline.h         |  4 ++--
 include/linux/mmzone.h            |  8 ++++----
 mm/vmscan.c                       | 20 ++++++++++----------
 4 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_lru.rst
index d7062c6a8946..d8f721f98868 100644
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -89,15 +89,15 @@ variables are monotonically increasing.
 
 Generation numbers are truncated into ``order_base_2(MAX_NR_GENS+1)``
 bits in order to fit into the gen counter in ``folio->flags``. Each
-truncated generation number is an index to ``lrugen->lists[]``. The
+truncated generation number is an index to ``lrugen->folios[]``. The
 sliding window technique is used to track at least ``MIN_NR_GENS`` and
 at most ``MAX_NR_GENS`` generations. The gen counter stores a value
 within ``[1, MAX_NR_GENS]`` while a page is on one of
-``lrugen->lists[]``; otherwise it stores zero.
+``lrugen->folios[]``; otherwise it stores zero.
 
 Each generation is divided into multiple tiers. A page accessed ``N``
 times through file descriptors is in tier ``order_base_2(N)``. Unlike
-generations, tiers do not have dedicated ``lrugen->lists[]``. In
+generations, tiers do not have dedicated ``lrugen->folios[]``. In
 contrast to moving across generations, which requires the LRU lock,
 moving across tiers only involves atomic operations on
 ``folio->flags`` and therefore has a negligible cost. A feedback loop
@@ -127,7 +127,7 @@ page mapped by this PTE to ``(max_seq%MAX_NR_GENS)+1``.
 Eviction
 --------
 The eviction consumes old generations. Given an ``lruvec``, it
-increments ``min_seq`` when ``lrugen->lists[]`` indexed by
+increments ``min_seq`` when ``lrugen->folios[]`` indexed by
 ``min_seq%MAX_NR_GENS`` becomes empty. To select a type and a tier to
 evict from, it first compares ``min_seq[]`` to select the older type.
 If both types are equally old, it selects the one whose first tier has
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index f63968bd7de5..da38e3d962e2 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -256,9 +256,9 @@ static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio,
 	lru_gen_update_size(lruvec, folio, -1, gen);
 	/* for folio_rotate_reclaimable() */
 	if (reclaiming)
-		list_add_tail(&folio->lru, &lrugen->lists[gen][type][zone]);
+		list_add_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
 	else
-		list_add(&folio->lru, &lrugen->lists[gen][type][zone]);
+		list_add(&folio->lru, &lrugen->folios[gen][type][zone]);
 
 	return true;
 }
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index bd3e4689f72d..02e432374471 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -312,7 +312,7 @@ enum lruvec_flags {
  * They form a sliding window of a variable size [MIN_NR_GENS, MAX_NR_GENS]. An
  * offset within MAX_NR_GENS, i.e., gen, indexes the LRU list of the
  * corresponding generation. The gen counter in folio->flags stores gen+1 while
- * a page is on one of lrugen->lists[]. Otherwise it stores 0.
+ * a page is on one of lrugen->folios[]. Otherwise it stores 0.
  *
  * A page is added to the youngest generation on faulting. The aging needs to
  * check the accessed bit at least twice before handing this page over to the
@@ -324,8 +324,8 @@ enum lruvec_flags {
  * rest of generations, if they exist, are considered inactive. See
  * lru_gen_is_active().
  *
- * PG_active is always cleared while a page is on one of lrugen->lists[] so that
- * the aging needs not to worry about it. And it's set again when a page
+ * PG_active is always cleared while a page is on one of lrugen->folios[] so
+ * that the aging needs not to worry about it. And it's set again when a page
  * considered active is isolated for non-reclaiming purposes, e.g., migration.
  * See lru_gen_add_folio() and lru_gen_del_folio().
  *
@@ -412,7 +412,7 @@ struct lru_gen_folio {
 	/* the birth time of each generation in jiffies */
 	unsigned long timestamps[MAX_NR_GENS];
 	/* the multi-gen LRU lists, lazily sorted on eviction */
-	struct list_head lists[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
+	struct list_head folios[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
 	/* the multi-gen LRU sizes, eventually consistent */
 	long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
 	/* the exponential moving average of refaulted */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fcb4ac351f93..ebab1ec3d400 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4256,7 +4256,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap)
 
 	/* prevent cold/hot inversion if force_scan is true */
 	for (zone = 0; zone < MAX_NR_ZONES; zone++) {
-		struct list_head *head = &lrugen->lists[old_gen][type][zone];
+		struct list_head *head = &lrugen->folios[old_gen][type][zone];
 
 		while (!list_empty(head)) {
 			struct folio *folio = lru_to_folio(head);
@@ -4267,7 +4267,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap)
 			VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio);
 
 			new_gen = folio_inc_gen(lruvec, folio, false);
-			list_move_tail(&folio->lru, &lrugen->lists[new_gen][type][zone]);
+			list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]);
 
 			if (!--remaining)
 				return false;
@@ -4295,7 +4295,7 @@ static bool try_to_inc_min_seq(struct lruvec *lruvec, bool can_swap)
 			gen = lru_gen_from_seq(min_seq[type]);
 
 			for (zone = 0; zone < MAX_NR_ZONES; zone++) {
-				if (!list_empty(&lrugen->lists[gen][type][zone]))
+				if (!list_empty(&lrugen->folios[gen][type][zone]))
 					goto next;
 			}
 
@@ -4760,7 +4760,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, int tier_idx)
 
 	/* promoted */
 	if (gen != lru_gen_from_seq(lrugen->min_seq[type])) {
-		list_move(&folio->lru, &lrugen->lists[gen][type][zone]);
+		list_move(&folio->lru, &lrugen->folios[gen][type][zone]);
 		return true;
 	}
 
@@ -4769,7 +4769,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, int tier_idx)
 		int hist = lru_hist_from_seq(lrugen->min_seq[type]);
 
 		gen = folio_inc_gen(lruvec, folio, false);
-		list_move_tail(&folio->lru, &lrugen->lists[gen][type][zone]);
+		list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
 
 		WRITE_ONCE(lrugen->protected[hist][type][tier - 1],
 			   lrugen->protected[hist][type][tier - 1] + delta);
@@ -4781,7 +4781,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, int tier_idx)
 	if (folio_test_locked(folio) || folio_test_writeback(folio) ||
 	    (type == LRU_GEN_FILE && folio_test_dirty(folio))) {
 		gen = folio_inc_gen(lruvec, folio, true);
-		list_move(&folio->lru, &lrugen->lists[gen][type][zone]);
+		list_move(&folio->lru, &lrugen->folios[gen][type][zone]);
 		return true;
 	}
 
@@ -4848,7 +4848,7 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
 	for (zone = sc->reclaim_idx; zone >= 0; zone--) {
 		LIST_HEAD(moved);
 		int skipped = 0;
-		struct list_head *head = &lrugen->lists[gen][type][zone];
+		struct list_head *head = &lrugen->folios[gen][type][zone];
 
 		while (!list_empty(head)) {
 			struct folio *folio = lru_to_folio(head);
@@ -5248,7 +5248,7 @@ static bool __maybe_unused state_is_valid(struct lruvec *lruvec)
 		int gen, type, zone;
 
 		for_each_gen_type_zone(gen, type, zone) {
-			if (!list_empty(&lrugen->lists[gen][type][zone]))
+			if (!list_empty(&lrugen->folios[gen][type][zone]))
 				return false;
 		}
 	}
@@ -5293,7 +5293,7 @@ static bool drain_evictable(struct lruvec *lruvec)
 	int remaining = MAX_LRU_BATCH;
 
 	for_each_gen_type_zone(gen, type, zone) {
-		struct list_head *head = &lruvec->lrugen.lists[gen][type][zone];
+		struct list_head *head = &lruvec->lrugen.folios[gen][type][zone];
 
 		while (!list_empty(head)) {
 			bool success;
@@ -5827,7 +5827,7 @@ void lru_gen_init_lruvec(struct lruvec *lruvec)
 		lrugen->timestamps[i] = jiffies;
 
 	for_each_gen_type_zone(gen, type, zone)
-		INIT_LIST_HEAD(&lrugen->lists[gen][type][zone]);
+		INIT_LIST_HEAD(&lrugen->folios[gen][type][zone]);
 
 	lruvec->mm_state.seq = MIN_NR_GENS;
 	init_waitqueue_head(&lruvec->mm_state.wait);
-- 
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH mm-unstable v1 3/8] mm: multi-gen LRU: remove eviction fairness safeguard
  2022-12-01 22:39 [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU Yu Zhao
  2022-12-01 22:39 ` [PATCH mm-unstable v1 1/8] mm: multi-gen LRU: rename lru_gen_struct to lru_gen_folio Yu Zhao
  2022-12-01 22:39 ` [PATCH mm-unstable v1 2/8] mm: multi-gen LRU: rename lrugen->lists[] to lrugen->folios[] Yu Zhao
@ 2022-12-01 22:39 ` Yu Zhao
  2022-12-11  3:59   ` Chen Wandun
  2022-12-01 22:39 ` [PATCH mm-unstable v1 4/8] mm: multi-gen LRU: remove aging " Yu Zhao
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Yu Zhao @ 2022-12-01 22:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Jonathan Corbet, Michael Larabel, Michal Hocko,
	Mike Rapoport, Roman Gushchin, Suren Baghdasaryan, linux-mm,
	linux-kernel, linux-mm, Yu Zhao

Recall that the eviction consumes the oldest generation: first it
bucket-sorts folios whose gen counters were updated by the aging and
reclaims the rest; then it increments lrugen->min_seq.

The current eviction fairness safeguard for global reclaim has a
dilemma: when there are multiple eligible memcgs, should it continue
or stop upon meeting the reclaim goal? If it continues, it overshoots
and increases direct reclaim latency; if it stops, it loses fairness
between memcgs it has taken memory away from and those it has yet to.

With memcg LRU, the eviction, while ensuring eventual fairness, will
stop upon meeting its goal. Therefore the current eviction fairness
safeguard for global reclaim will not be needed.

Note that memcg LRU only applies to global reclaim. For memcg reclaim,
the eviction will continue, even if it is overshooting. This becomes
unconditional due to code simplification.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/vmscan.c | 81 +++++++++++++++--------------------------------------
 1 file changed, 23 insertions(+), 58 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index ebab1ec3d400..d714a777c88b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -449,6 +449,11 @@ static bool cgroup_reclaim(struct scan_control *sc)
 	return sc->target_mem_cgroup;
 }
 
+static bool global_reclaim(struct scan_control *sc)
+{
+	return !sc->target_mem_cgroup || mem_cgroup_is_root(sc->target_mem_cgroup);
+}
+
 /**
  * writeback_throttling_sane - is the usual dirty throttling mechanism available?
  * @sc: scan_control in question
@@ -499,6 +504,11 @@ static bool cgroup_reclaim(struct scan_control *sc)
 	return false;
 }
 
+static bool global_reclaim(struct scan_control *sc)
+{
+	return true;
+}
+
 static bool writeback_throttling_sane(struct scan_control *sc)
 {
 	return true;
@@ -4991,8 +5001,7 @@ static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int sw
 	return scanned;
 }
 
-static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness,
-			bool *need_swapping)
+static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness)
 {
 	int type;
 	int scanned;
@@ -5081,9 +5090,6 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 		goto retry;
 	}
 
-	if (need_swapping && type == LRU_GEN_ANON)
-		*need_swapping = true;
-
 	return scanned;
 }
 
@@ -5122,67 +5128,26 @@ static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *
 	return min_seq[!can_swap] + MIN_NR_GENS <= max_seq ? nr_to_scan : 0;
 }
 
-static bool should_abort_scan(struct lruvec *lruvec, unsigned long seq,
-			      struct scan_control *sc, bool need_swapping)
+static unsigned long get_nr_to_reclaim(struct scan_control *sc)
 {
-	int i;
-	DEFINE_MAX_SEQ(lruvec);
+	/* don't abort memcg reclaim to ensure fairness */
+	if (!global_reclaim(sc))
+		return -1;
 
-	if (!current_is_kswapd()) {
-		/* age each memcg at most once to ensure fairness */
-		if (max_seq - seq > 1)
-			return true;
+	/* discount the previous progress for kswapd */
+	if (current_is_kswapd())
+		return sc->nr_to_reclaim + sc->last_reclaimed;
 
-		/* over-swapping can increase allocation latency */
-		if (sc->nr_reclaimed >= sc->nr_to_reclaim && need_swapping)
-			return true;
-
-		/* give this thread a chance to exit and free its memory */
-		if (fatal_signal_pending(current)) {
-			sc->nr_reclaimed += MIN_LRU_BATCH;
-			return true;
-		}
-
-		if (cgroup_reclaim(sc))
-			return false;
-	} else if (sc->nr_reclaimed - sc->last_reclaimed < sc->nr_to_reclaim)
-		return false;
-
-	/* keep scanning at low priorities to ensure fairness */
-	if (sc->priority > DEF_PRIORITY - 2)
-		return false;
-
-	/*
-	 * A minimum amount of work was done under global memory pressure. For
-	 * kswapd, it may be overshooting. For direct reclaim, the allocation
-	 * may succeed if all suitable zones are somewhat safe. In either case,
-	 * it's better to stop now, and restart later if necessary.
-	 */
-	for (i = 0; i <= sc->reclaim_idx; i++) {
-		unsigned long wmark;
-		struct zone *zone = lruvec_pgdat(lruvec)->node_zones + i;
-
-		if (!managed_zone(zone))
-			continue;
-
-		wmark = current_is_kswapd() ? high_wmark_pages(zone) : low_wmark_pages(zone);
-		if (wmark > zone_page_state(zone, NR_FREE_PAGES))
-			return false;
-	}
-
-	sc->nr_reclaimed += MIN_LRU_BATCH;
-
-	return true;
+	return max(sc->nr_to_reclaim, compact_gap(sc->order));
 }
 
 static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
 {
 	struct blk_plug plug;
 	bool need_aging = false;
-	bool need_swapping = false;
 	unsigned long scanned = 0;
 	unsigned long reclaimed = sc->nr_reclaimed;
-	DEFINE_MAX_SEQ(lruvec);
+	unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
 
 	lru_add_drain();
 
@@ -5206,7 +5171,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 		if (!nr_to_scan)
 			goto done;
 
-		delta = evict_folios(lruvec, sc, swappiness, &need_swapping);
+		delta = evict_folios(lruvec, sc, swappiness);
 		if (!delta)
 			goto done;
 
@@ -5214,7 +5179,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 		if (scanned >= nr_to_scan)
 			break;
 
-		if (should_abort_scan(lruvec, max_seq, sc, need_swapping))
+		if (sc->nr_reclaimed >= nr_to_reclaim)
 			break;
 
 		cond_resched();
@@ -5661,7 +5626,7 @@ static int run_eviction(struct lruvec *lruvec, unsigned long seq, struct scan_co
 		if (sc->nr_reclaimed >= nr_to_reclaim)
 			return 0;
 
-		if (!evict_folios(lruvec, sc, swappiness, NULL))
+		if (!evict_folios(lruvec, sc, swappiness))
 			return 0;
 
 		cond_resched();
-- 
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH mm-unstable v1 4/8] mm: multi-gen LRU: remove aging fairness safeguard
  2022-12-01 22:39 [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU Yu Zhao
                   ` (2 preceding siblings ...)
  2022-12-01 22:39 ` [PATCH mm-unstable v1 3/8] mm: multi-gen LRU: remove eviction fairness safeguard Yu Zhao
@ 2022-12-01 22:39 ` Yu Zhao
  2022-12-01 22:39 ` [PATCH mm-unstable v1 5/8] mm: multi-gen LRU: shuffle should_run_aging() Yu Zhao
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Yu Zhao @ 2022-12-01 22:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Jonathan Corbet, Michael Larabel, Michal Hocko,
	Mike Rapoport, Roman Gushchin, Suren Baghdasaryan, linux-mm,
	linux-kernel, linux-mm, Yu Zhao

Recall that the aging produces the youngest generation: first it scans
for accessed folios and updates their gen counters; then it increments
lrugen->max_seq.

The current aging fairness safeguard for kswapd uses two passes to
ensure the fairness to multiple eligible memcgs. On the first pass,
which is shared with the eviction, it checks whether all eligible
memcgs are low on cold folios. If so, it requires a second pass, on
which it ages all those memcgs at the same time.

With memcg LRU, the aging, while ensuring eventual fairness, will run
when necessary. Therefore the current aging fairness safeguard for
kswapd will not be needed.

Note that memcg LRU only applies to global reclaim. For memcg reclaim,
the aging can be unfair to different memcgs, i.e., their
lrugen->max_seq can be incremented at different paces.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/vmscan.c | 150 +++++++++++++++++++++++++---------------------------
 1 file changed, 71 insertions(+), 79 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index d714a777c88b..67967a4b18a9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -137,7 +137,6 @@ struct scan_control {
 
 #ifdef CONFIG_LRU_GEN
 	/* help kswapd make better choices among multiple memcgs */
-	unsigned int memcgs_need_aging:1;
 	unsigned long last_reclaimed;
 #endif
 
@@ -4453,7 +4452,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
 	return true;
 }
 
-static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, unsigned long *min_seq,
+static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
 			     struct scan_control *sc, bool can_swap, unsigned long *nr_to_scan)
 {
 	int gen, type, zone;
@@ -4462,6 +4461,13 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, unsig
 	unsigned long total = 0;
 	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+	DEFINE_MIN_SEQ(lruvec);
+
+	/* whether this lruvec is completely out of cold folios */
+	if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) {
+		*nr_to_scan = 0;
+		return true;
+	}
 
 	for (type = !can_swap; type < ANON_AND_FILE; type++) {
 		unsigned long seq;
@@ -4490,8 +4496,6 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, unsig
 	 * stalls when the number of generations reaches MIN_NR_GENS. Hence, the
 	 * ideal number of generations is MIN_NR_GENS+1.
 	 */
-	if (min_seq[!can_swap] + MIN_NR_GENS > max_seq)
-		return true;
 	if (min_seq[!can_swap] + MIN_NR_GENS < max_seq)
 		return false;
 
@@ -4510,40 +4514,54 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, unsig
 	return false;
 }
 
-static bool age_lruvec(struct lruvec *lruvec, struct scan_control *sc, unsigned long min_ttl)
+static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
 {
-	bool need_aging;
-	unsigned long nr_to_scan;
-	int swappiness = get_swappiness(lruvec, sc);
+	int gen, type, zone;
+	unsigned long total = 0;
+	bool can_swap = get_swappiness(lruvec, sc);
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 	DEFINE_MAX_SEQ(lruvec);
 	DEFINE_MIN_SEQ(lruvec);
 
+	for (type = !can_swap; type < ANON_AND_FILE; type++) {
+		unsigned long seq;
+
+		for (seq = min_seq[type]; seq <= max_seq; seq++) {
+			gen = lru_gen_from_seq(seq);
+
+			for (zone = 0; zone < MAX_NR_ZONES; zone++)
+				total += max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
+		}
+	}
+
+	/* whether the size is big enough to be helpful */
+	return mem_cgroup_online(memcg) ? (total >> sc->priority) : total;
+}
+
+static bool lruvec_is_reclaimable(struct lruvec *lruvec, struct scan_control *sc,
+				  unsigned long min_ttl)
+{
+	int gen;
+	unsigned long birth;
+	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+	DEFINE_MIN_SEQ(lruvec);
+
 	VM_WARN_ON_ONCE(sc->memcg_low_reclaim);
 
+	/* see the comment on lru_gen_folio */
+	gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
+	birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
+
+	if (time_is_after_jiffies(birth + min_ttl))
+		return false;
+
+	if (!lruvec_is_sizable(lruvec, sc))
+		return false;
+
 	mem_cgroup_calculate_protection(NULL, memcg);
 
-	if (mem_cgroup_below_min(memcg))
-		return false;
-
-	need_aging = should_run_aging(lruvec, max_seq, min_seq, sc, swappiness, &nr_to_scan);
-
-	if (min_ttl) {
-		int gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
-		unsigned long birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
-
-		if (time_is_after_jiffies(birth + min_ttl))
-			return false;
-
-		/* the size is likely too small to be helpful */
-		if (!nr_to_scan && sc->priority != DEF_PRIORITY)
-			return false;
-	}
-
-	if (need_aging)
-		try_to_inc_max_seq(lruvec, max_seq, sc, swappiness, false);
-
-	return true;
+	return !mem_cgroup_below_min(memcg);
 }
 
 /* to protect the working set of the last N jiffies */
@@ -4552,46 +4570,32 @@ static unsigned long lru_gen_min_ttl __read_mostly;
 static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc)
 {
 	struct mem_cgroup *memcg;
-	bool success = false;
 	unsigned long min_ttl = READ_ONCE(lru_gen_min_ttl);
 
 	VM_WARN_ON_ONCE(!current_is_kswapd());
 
 	sc->last_reclaimed = sc->nr_reclaimed;
 
-	/*
-	 * To reduce the chance of going into the aging path, which can be
-	 * costly, optimistically skip it if the flag below was cleared in the
-	 * eviction path. This improves the overall performance when multiple
-	 * memcgs are available.
-	 */
-	if (!sc->memcgs_need_aging) {
-		sc->memcgs_need_aging = true;
-		return;
-	}
-
-	set_mm_walk(pgdat);
-
-	memcg = mem_cgroup_iter(NULL, NULL, NULL);
-	do {
-		struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
-
-		if (age_lruvec(lruvec, sc, min_ttl))
-			success = true;
-
-		cond_resched();
-	} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)));
-
-	clear_mm_walk();
-
 	/* check the order to exclude compaction-induced reclaim */
-	if (success || !min_ttl || sc->order)
+	if (!min_ttl || sc->order || sc->priority == DEF_PRIORITY)
 		return;
 
+	memcg = mem_cgroup_iter(NULL, NULL, NULL);
+	do {
+		struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
+
+		if (lruvec_is_reclaimable(lruvec, sc, min_ttl)) {
+			mem_cgroup_iter_break(NULL, memcg);
+			return;
+		}
+
+		cond_resched();
+	} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)));
+
 	/*
 	 * The main goal is to OOM kill if every generation from all memcgs is
 	 * younger than min_ttl. However, another possibility is all memcgs are
-	 * either below min or empty.
+	 * either too small or below min.
 	 */
 	if (mutex_trylock(&oom_lock)) {
 		struct oom_control oc = {
@@ -5099,33 +5103,27 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
  *    reclaim.
  */
 static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,
-				    bool can_swap, bool *need_aging)
+				    bool can_swap)
 {
 	unsigned long nr_to_scan;
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 	DEFINE_MAX_SEQ(lruvec);
-	DEFINE_MIN_SEQ(lruvec);
 
 	if (mem_cgroup_below_min(memcg) ||
 	    (mem_cgroup_below_low(memcg) && !sc->memcg_low_reclaim))
 		return 0;
 
-	*need_aging = should_run_aging(lruvec, max_seq, min_seq, sc, can_swap, &nr_to_scan);
-	if (!*need_aging)
+	if (!should_run_aging(lruvec, max_seq, sc, can_swap, &nr_to_scan))
 		return nr_to_scan;
 
 	/* skip the aging path at the default priority */
 	if (sc->priority == DEF_PRIORITY)
-		goto done;
-
-	/* leave the work to lru_gen_age_node() */
-	if (current_is_kswapd())
-		return 0;
-
-	if (try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false))
 		return nr_to_scan;
-done:
-	return min_seq[!can_swap] + MIN_NR_GENS <= max_seq ? nr_to_scan : 0;
+
+	try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false);
+
+	/* skip this lruvec as it's low on cold folios */
+	return 0;
 }
 
 static unsigned long get_nr_to_reclaim(struct scan_control *sc)
@@ -5144,9 +5142,7 @@ static unsigned long get_nr_to_reclaim(struct scan_control *sc)
 static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
 {
 	struct blk_plug plug;
-	bool need_aging = false;
 	unsigned long scanned = 0;
-	unsigned long reclaimed = sc->nr_reclaimed;
 	unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
 
 	lru_add_drain();
@@ -5167,13 +5163,13 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 		else
 			swappiness = 0;
 
-		nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness, &need_aging);
+		nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
 		if (!nr_to_scan)
-			goto done;
+			break;
 
 		delta = evict_folios(lruvec, sc, swappiness);
 		if (!delta)
-			goto done;
+			break;
 
 		scanned += delta;
 		if (scanned >= nr_to_scan)
@@ -5185,10 +5181,6 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 		cond_resched();
 	}
 
-	/* see the comment in lru_gen_age_node() */
-	if (sc->nr_reclaimed - reclaimed >= MIN_LRU_BATCH && !need_aging)
-		sc->memcgs_need_aging = false;
-done:
 	clear_mm_walk();
 
 	blk_finish_plug(&plug);
-- 
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH mm-unstable v1 5/8] mm: multi-gen LRU: shuffle should_run_aging()
  2022-12-01 22:39 [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU Yu Zhao
                   ` (3 preceding siblings ...)
  2022-12-01 22:39 ` [PATCH mm-unstable v1 4/8] mm: multi-gen LRU: remove aging " Yu Zhao
@ 2022-12-01 22:39 ` Yu Zhao
  2022-12-01 22:39 ` [PATCH mm-unstable v1 6/8] mm: multi-gen LRU: per-node lru_gen_folio lists Yu Zhao
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Yu Zhao @ 2022-12-01 22:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Jonathan Corbet, Michael Larabel, Michal Hocko,
	Mike Rapoport, Roman Gushchin, Suren Baghdasaryan, linux-mm,
	linux-kernel, linux-mm, Yu Zhao

Move should_run_aging() next to its only caller left.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/vmscan.c | 124 ++++++++++++++++++++++++++--------------------------
 1 file changed, 62 insertions(+), 62 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 67967a4b18a9..0557adce75c5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4452,68 +4452,6 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
 	return true;
 }
 
-static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
-			     struct scan_control *sc, bool can_swap, unsigned long *nr_to_scan)
-{
-	int gen, type, zone;
-	unsigned long old = 0;
-	unsigned long young = 0;
-	unsigned long total = 0;
-	struct lru_gen_folio *lrugen = &lruvec->lrugen;
-	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
-	DEFINE_MIN_SEQ(lruvec);
-
-	/* whether this lruvec is completely out of cold folios */
-	if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) {
-		*nr_to_scan = 0;
-		return true;
-	}
-
-	for (type = !can_swap; type < ANON_AND_FILE; type++) {
-		unsigned long seq;
-
-		for (seq = min_seq[type]; seq <= max_seq; seq++) {
-			unsigned long size = 0;
-
-			gen = lru_gen_from_seq(seq);
-
-			for (zone = 0; zone < MAX_NR_ZONES; zone++)
-				size += max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
-
-			total += size;
-			if (seq == max_seq)
-				young += size;
-			else if (seq + MIN_NR_GENS == max_seq)
-				old += size;
-		}
-	}
-
-	/* try to scrape all its memory if this memcg was deleted */
-	*nr_to_scan = mem_cgroup_online(memcg) ? (total >> sc->priority) : total;
-
-	/*
-	 * The aging tries to be lazy to reduce the overhead, while the eviction
-	 * stalls when the number of generations reaches MIN_NR_GENS. Hence, the
-	 * ideal number of generations is MIN_NR_GENS+1.
-	 */
-	if (min_seq[!can_swap] + MIN_NR_GENS < max_seq)
-		return false;
-
-	/*
-	 * It's also ideal to spread pages out evenly, i.e., 1/(MIN_NR_GENS+1)
-	 * of the total number of pages for each generation. A reasonable range
-	 * for this average portion is [1/MIN_NR_GENS, 1/(MIN_NR_GENS+2)]. The
-	 * aging cares about the upper bound of hot pages, while the eviction
-	 * cares about the lower bound of cold pages.
-	 */
-	if (young * MIN_NR_GENS > total)
-		return true;
-	if (old * (MIN_NR_GENS + 2) < total)
-		return true;
-
-	return false;
-}
-
 static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
 {
 	int gen, type, zone;
@@ -5097,6 +5035,68 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 	return scanned;
 }
 
+static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
+			     struct scan_control *sc, bool can_swap, unsigned long *nr_to_scan)
+{
+	int gen, type, zone;
+	unsigned long old = 0;
+	unsigned long young = 0;
+	unsigned long total = 0;
+	struct lru_gen_folio *lrugen = &lruvec->lrugen;
+	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+	DEFINE_MIN_SEQ(lruvec);
+
+	/* whether this lruvec is completely out of cold folios */
+	if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) {
+		*nr_to_scan = 0;
+		return true;
+	}
+
+	for (type = !can_swap; type < ANON_AND_FILE; type++) {
+		unsigned long seq;
+
+		for (seq = min_seq[type]; seq <= max_seq; seq++) {
+			unsigned long size = 0;
+
+			gen = lru_gen_from_seq(seq);
+
+			for (zone = 0; zone < MAX_NR_ZONES; zone++)
+				size += max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
+
+			total += size;
+			if (seq == max_seq)
+				young += size;
+			else if (seq + MIN_NR_GENS == max_seq)
+				old += size;
+		}
+	}
+
+	/* try to scrape all its memory if this memcg was deleted */
+	*nr_to_scan = mem_cgroup_online(memcg) ? (total >> sc->priority) : total;
+
+	/*
+	 * The aging tries to be lazy to reduce the overhead, while the eviction
+	 * stalls when the number of generations reaches MIN_NR_GENS. Hence, the
+	 * ideal number of generations is MIN_NR_GENS+1.
+	 */
+	if (min_seq[!can_swap] + MIN_NR_GENS < max_seq)
+		return false;
+
+	/*
+	 * It's also ideal to spread pages out evenly, i.e., 1/(MIN_NR_GENS+1)
+	 * of the total number of pages for each generation. A reasonable range
+	 * for this average portion is [1/MIN_NR_GENS, 1/(MIN_NR_GENS+2)]. The
+	 * aging cares about the upper bound of hot pages, while the eviction
+	 * cares about the lower bound of cold pages.
+	 */
+	if (young * MIN_NR_GENS > total)
+		return true;
+	if (old * (MIN_NR_GENS + 2) < total)
+		return true;
+
+	return false;
+}
+
 /*
  * For future optimizations:
  * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
-- 
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH mm-unstable v1 6/8] mm: multi-gen LRU: per-node lru_gen_folio lists
  2022-12-01 22:39 [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU Yu Zhao
                   ` (4 preceding siblings ...)
  2022-12-01 22:39 ` [PATCH mm-unstable v1 5/8] mm: multi-gen LRU: shuffle should_run_aging() Yu Zhao
@ 2022-12-01 22:39 ` Yu Zhao
  2022-12-03  4:20   ` Hillf Danton
  2022-12-01 22:39 ` [PATCH mm-unstable v1 7/8] mm: multi-gen LRU: clarify scan_control flags Yu Zhao
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Yu Zhao @ 2022-12-01 22:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Jonathan Corbet, Michael Larabel, Michal Hocko,
	Mike Rapoport, Roman Gushchin, Suren Baghdasaryan, linux-mm,
	linux-kernel, linux-mm, Yu Zhao

For each node, memcgs are divided into two generations: the old and
the young. For each generation, memcgs are randomly sharded into
multiple bins to improve scalability. For each bin, an RCU hlist_nulls
is virtually divided into three segments: the head, the tail and the
default.

An onlining memcg is added to the tail of a random bin in the old
generation. The eviction starts at the head of a random bin in the
old generation. The per-node memcg generation counter, whose reminder
(mod 2) indexes the old generation, is incremented when all its bins
become empty.

There are four operations:
1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin
   in its current generation (old or young) and updates its "seg" to
   "head";
2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin
   in its current generation (old or young) and updates its "seg" to
   "tail";
3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in
   the old generation, updates its "gen" to "old" and resets its "seg"
   to "default";
4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin
   in the young generation, updates its "gen" to "young" and resets
   its "seg" to "default".

The events that trigger the above operations are:
1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
2. The first attempt to reclaim an memcg below low, which triggers
   MEMCG_LRU_TAIL;
3. The first attempt to reclaim an memcg below reclaimable size
   threshold, which triggers MEMCG_LRU_TAIL;
4. The second attempt to reclaim an memcg below reclaimable size
   threshold, which triggers MEMCG_LRU_YOUNG;
5. Attempting to reclaim an memcg below min, which triggers
   MEMCG_LRU_YOUNG;
6. Finishing the aging on the eviction path, which triggers
   MEMCG_LRU_YOUNG;
7. Offlining an memcg, which triggers MEMCG_LRU_OLD.

Note that memcg LRU only applies to global reclaim. For memcg reclaim,
it still relies on mem_cgroup_iter().

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 include/linux/memcontrol.h |  10 +
 include/linux/mm_inline.h  |  17 ++
 include/linux/mmzone.h     | 113 ++++++++++-
 mm/memcontrol.c            |  16 ++
 mm/page_alloc.c            |   1 +
 mm/vmscan.c                | 373 +++++++++++++++++++++++++++++++++----
 6 files changed, 495 insertions(+), 35 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e1644a24009c..f9a44d32e763 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -790,6 +790,11 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg)
 	percpu_ref_put(&objcg->refcnt);
 }
 
+static inline bool mem_cgroup_tryget(struct mem_cgroup *memcg)
+{
+	return !memcg || css_tryget(&memcg->css);
+}
+
 static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 	if (memcg)
@@ -1290,6 +1295,11 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg)
 {
 }
 
+static inline bool mem_cgroup_tryget(struct mem_cgroup *memcg)
+{
+	return true;
+}
+
 static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 }
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index da38e3d962e2..c1fd3922dc5d 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -122,6 +122,18 @@ static inline bool lru_gen_in_fault(void)
 	return current->in_lru_fault;
 }
 
+#ifdef CONFIG_MEMCG
+static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+	return READ_ONCE(lruvec->lrugen.seg);
+}
+#else
+static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+	return 0;
+}
+#endif
+
 static inline int lru_gen_from_seq(unsigned long seq)
 {
 	return seq % MAX_NR_GENS;
@@ -297,6 +309,11 @@ static inline bool lru_gen_in_fault(void)
 	return false;
 }
 
+static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+	return 0;
+}
+
 static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
 {
 	return false;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 02e432374471..87b3b5a2aac4 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -7,6 +7,7 @@
 
 #include <linux/spinlock.h>
 #include <linux/list.h>
+#include <linux/list_nulls.h>
 #include <linux/wait.h>
 #include <linux/bitops.h>
 #include <linux/cache.h>
@@ -367,6 +368,15 @@ struct page_vma_mapped_walk;
 #define LRU_GEN_MASK		((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
 #define LRU_REFS_MASK		((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
 
+/* see the comment on MEMCG_NR_GENS */
+enum {
+	MEMCG_LRU_NOP,
+	MEMCG_LRU_HEAD,
+	MEMCG_LRU_TAIL,
+	MEMCG_LRU_OLD,
+	MEMCG_LRU_YOUNG,
+};
+
 #ifdef CONFIG_LRU_GEN
 
 enum {
@@ -426,6 +436,14 @@ struct lru_gen_folio {
 	atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS];
 	/* whether the multi-gen LRU is enabled */
 	bool enabled;
+#ifdef CONFIG_MEMCG
+	/* the memcg generation this lru_gen_folio belongs to */
+	u8 gen;
+	/* the list segment this lru_gen_folio belongs to */
+	u8 seg;
+	/* per-node lru_gen_folio list for global reclaim */
+	struct hlist_nulls_node list;
+#endif
 };
 
 enum {
@@ -479,12 +497,83 @@ void lru_gen_init_lruvec(struct lruvec *lruvec);
 void lru_gen_look_around(struct page_vma_mapped_walk *pvmw);
 
 #ifdef CONFIG_MEMCG
+
+/*
+ * For each node, memcgs are divided into two generations: the old and the
+ * young. For each generation, memcgs are randomly sharded into multiple bins
+ * to improve scalability. For each bin, the hlist_nulls is virtually divided
+ * into three segments: the head, the tail and the default.
+ *
+ * An onlining memcg is added to the tail of a random bin in the old generation.
+ * The eviction starts at the head of a random bin in the old generation. The
+ * per-node memcg generation counter, whose reminder (mod MEMCG_NR_GENS) indexes
+ * the old generation, is incremented when all its bins become empty.
+ *
+ * There are four operations:
+ * 1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in its
+ *    current generation (old or young) and updates its "seg" to "head";
+ * 2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in its
+ *    current generation (old or young) and updates its "seg" to "tail";
+ * 3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in the old
+ *    generation, updates its "gen" to "old" and resets its "seg" to "default";
+ * 4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin in the
+ *    young generation, updates its "gen" to "young" and resets its "seg" to
+ *    "default".
+ *
+ * The events that trigger the above operations are:
+ * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
+ * 2. The first attempt to reclaim an memcg below low, which triggers
+ *    MEMCG_LRU_TAIL;
+ * 3. The first attempt to reclaim an memcg below reclaimable size threshold,
+ *    which triggers MEMCG_LRU_TAIL;
+ * 4. The second attempt to reclaim an memcg below reclaimable size threshold,
+ *    which triggers MEMCG_LRU_YOUNG;
+ * 5. Attempting to reclaim an memcg below min, which triggers MEMCG_LRU_YOUNG;
+ * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG;
+ * 7. Offlining an memcg, which triggers MEMCG_LRU_OLD.
+ */
+#define MEMCG_NR_GENS	2
+#define MEMCG_NR_BINS	8
+
+struct lru_gen_memcg {
+	/* the per-node memcg generation counter */
+	unsigned long seq;
+	/* each memcg has one lru_gen_folio per node */
+	unsigned long nr_memcgs[MEMCG_NR_GENS];
+	/* per-node lru_gen_folio list for global reclaim */
+	struct hlist_nulls_head	fifo[MEMCG_NR_GENS][MEMCG_NR_BINS];
+	/* protects the above */
+	spinlock_t lock;
+};
+
+void lru_gen_init_pgdat(struct pglist_data *pgdat);
+
 void lru_gen_init_memcg(struct mem_cgroup *memcg);
 void lru_gen_exit_memcg(struct mem_cgroup *memcg);
-#endif
+void lru_gen_online_memcg(struct mem_cgroup *memcg);
+void lru_gen_offline_memcg(struct mem_cgroup *memcg);
+void lru_gen_release_memcg(struct mem_cgroup *memcg);
+void lru_gen_rotate_memcg(struct lruvec *lruvec, int op);
+
+#else /* !CONFIG_MEMCG */
+
+#define MEMCG_NR_GENS	1
+
+struct lru_gen_memcg {
+};
+
+static inline void lru_gen_init_pgdat(struct pglist_data *pgdat)
+{
+}
+
+#endif /* CONFIG_MEMCG */
 
 #else /* !CONFIG_LRU_GEN */
 
+static inline void lru_gen_init_pgdat(struct pglist_data *pgdat)
+{
+}
+
 static inline void lru_gen_init_lruvec(struct lruvec *lruvec)
 {
 }
@@ -494,6 +583,7 @@ static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 }
 
 #ifdef CONFIG_MEMCG
+
 static inline void lru_gen_init_memcg(struct mem_cgroup *memcg)
 {
 }
@@ -501,7 +591,24 @@ static inline void lru_gen_init_memcg(struct mem_cgroup *memcg)
 static inline void lru_gen_exit_memcg(struct mem_cgroup *memcg)
 {
 }
-#endif
+
+static inline void lru_gen_online_memcg(struct mem_cgroup *memcg)
+{
+}
+
+static inline void lru_gen_offline_memcg(struct mem_cgroup *memcg)
+{
+}
+
+static inline void lru_gen_release_memcg(struct mem_cgroup *memcg)
+{
+}
+
+static inline void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
+{
+}
+
+#endif /* CONFIG_MEMCG */
 
 #endif /* CONFIG_LRU_GEN */
 
@@ -1219,6 +1326,8 @@ typedef struct pglist_data {
 #ifdef CONFIG_LRU_GEN
 	/* kswap mm walk data */
 	struct lru_gen_mm_walk	mm_walk;
+	/* lru_gen_folio list */
+	struct lru_gen_memcg memcg_lru;
 #endif
 
 	CACHELINE_PADDING(_pad2_);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 23750cec0036..6b976829e9f7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -477,6 +477,16 @@ static void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid)
 	struct mem_cgroup_per_node *mz;
 	struct mem_cgroup_tree_per_node *mctz;
 
+	if (lru_gen_enabled()) {
+		struct lruvec *lruvec = &memcg->nodeinfo[nid]->lruvec;
+
+		/* see the comment on MEMCG_NR_GENS */
+		if (soft_limit_excess(memcg) && lru_gen_memcg_seg(lruvec) != MEMCG_LRU_HEAD)
+			lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD);
+
+		return;
+	}
+
 	mctz = soft_limit_tree.rb_tree_per_node[nid];
 	if (!mctz)
 		return;
@@ -3526,6 +3536,9 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 	struct mem_cgroup_tree_per_node *mctz;
 	unsigned long excess;
 
+	if (lru_gen_enabled())
+		return 0;
+
 	if (order > 0)
 		return 0;
 
@@ -5371,6 +5384,7 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
 	if (unlikely(mem_cgroup_is_root(memcg)))
 		queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
 				   2UL*HZ);
+	lru_gen_online_memcg(memcg);
 	return 0;
 offline_kmem:
 	memcg_offline_kmem(memcg);
@@ -5402,6 +5416,7 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	memcg_offline_kmem(memcg);
 	reparent_shrinker_deferred(memcg);
 	wb_memcg_offline(memcg);
+	lru_gen_offline_memcg(memcg);
 
 	drain_all_stock(memcg);
 
@@ -5413,6 +5428,7 @@ static void mem_cgroup_css_released(struct cgroup_subsys_state *css)
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 
 	invalidate_reclaim_iterators(memcg);
+	lru_gen_release_memcg(memcg);
 }
 
 static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2d4c81224508..0aa134b8dae2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7914,6 +7914,7 @@ static void __init free_area_init_node(int nid)
 	pgdat_set_deferred_range(pgdat);
 
 	free_area_init_core(pgdat);
+	lru_gen_init_pgdat(pgdat);
 }
 
 static void __init free_area_init_memoryless_node(int nid)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0557adce75c5..44506eb96c9d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -55,6 +55,7 @@
 #include <linux/ctype.h>
 #include <linux/debugfs.h>
 #include <linux/khugepaged.h>
+#include <linux/rculist_nulls.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -135,11 +136,6 @@ struct scan_control {
 	/* Always discard instead of demoting to lower tier memory */
 	unsigned int no_demotion:1;
 
-#ifdef CONFIG_LRU_GEN
-	/* help kswapd make better choices among multiple memcgs */
-	unsigned long last_reclaimed;
-#endif
-
 	/* Allocation order */
 	s8 order;
 
@@ -3167,6 +3163,9 @@ DEFINE_STATIC_KEY_ARRAY_FALSE(lru_gen_caps, NR_LRU_GEN_CAPS);
 		for ((type) = 0; (type) < ANON_AND_FILE; (type)++)	\
 			for ((zone) = 0; (zone) < MAX_NR_ZONES; (zone)++)
 
+#define get_memcg_gen(seq)	((seq) % MEMCG_NR_GENS)
+#define get_memcg_bin(bin)	((bin) % MEMCG_NR_BINS)
+
 static struct lruvec *get_lruvec(struct mem_cgroup *memcg, int nid)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
@@ -4438,8 +4437,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
 		if (sc->priority <= DEF_PRIORITY - 2)
 			wait_event_killable(lruvec->mm_state.wait,
 					    max_seq < READ_ONCE(lrugen->max_seq));
-
-		return max_seq < READ_ONCE(lrugen->max_seq);
+		return false;
 	}
 
 	VM_WARN_ON_ONCE(max_seq != READ_ONCE(lrugen->max_seq));
@@ -4512,8 +4510,6 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc)
 
 	VM_WARN_ON_ONCE(!current_is_kswapd());
 
-	sc->last_reclaimed = sc->nr_reclaimed;
-
 	/* check the order to exclude compaction-induced reclaim */
 	if (!min_ttl || sc->order || sc->priority == DEF_PRIORITY)
 		return;
@@ -5102,8 +5098,7 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
  * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
  *    reclaim.
  */
-static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,
-				    bool can_swap)
+static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool can_swap)
 {
 	unsigned long nr_to_scan;
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
@@ -5120,10 +5115,8 @@ static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *
 	if (sc->priority == DEF_PRIORITY)
 		return nr_to_scan;
 
-	try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false);
-
 	/* skip this lruvec as it's low on cold folios */
-	return 0;
+	return try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false) ? -1 : 0;
 }
 
 static unsigned long get_nr_to_reclaim(struct scan_control *sc)
@@ -5132,29 +5125,18 @@ static unsigned long get_nr_to_reclaim(struct scan_control *sc)
 	if (!global_reclaim(sc))
 		return -1;
 
-	/* discount the previous progress for kswapd */
-	if (current_is_kswapd())
-		return sc->nr_to_reclaim + sc->last_reclaimed;
-
 	return max(sc->nr_to_reclaim, compact_gap(sc->order));
 }
 
-static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
+static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
 {
-	struct blk_plug plug;
+	long nr_to_scan;
 	unsigned long scanned = 0;
 	unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
 
-	lru_add_drain();
-
-	blk_start_plug(&plug);
-
-	set_mm_walk(lruvec_pgdat(lruvec));
-
 	while (true) {
 		int delta;
 		int swappiness;
-		unsigned long nr_to_scan;
 
 		if (sc->may_swap)
 			swappiness = get_swappiness(lruvec, sc);
@@ -5164,7 +5146,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 			swappiness = 0;
 
 		nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
-		if (!nr_to_scan)
+		if (nr_to_scan <= 0)
 			break;
 
 		delta = evict_folios(lruvec, sc, swappiness);
@@ -5181,10 +5163,251 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 		cond_resched();
 	}
 
+	/* whether try_to_inc_max_seq() was successful */
+	return nr_to_scan < 0;
+}
+
+static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
+{
+	bool success;
+	unsigned long scanned = sc->nr_scanned;
+	unsigned long reclaimed = sc->nr_reclaimed;
+	int seg = lru_gen_memcg_seg(lruvec);
+	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+
+	/* see the comment on MEMCG_NR_GENS */
+	if (!lruvec_is_sizable(lruvec, sc))
+		return seg != MEMCG_LRU_TAIL ? MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG;
+
+	mem_cgroup_calculate_protection(NULL, memcg);
+
+	if (mem_cgroup_below_min(memcg))
+		return MEMCG_LRU_YOUNG;
+
+	if (mem_cgroup_below_low(memcg)) {
+		/* see the comment on MEMCG_NR_GENS */
+		if (seg != MEMCG_LRU_TAIL)
+			return MEMCG_LRU_TAIL;
+
+		memcg_memory_event(memcg, MEMCG_LOW);
+	}
+
+	success = try_to_shrink_lruvec(lruvec, sc);
+
+	shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);
+
+	if (!sc->proactive)
+		vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
+			   sc->nr_reclaimed - reclaimed);
+
+	sc->nr_reclaimed += current->reclaim_state->reclaimed_slab;
+	current->reclaim_state->reclaimed_slab = 0;
+
+	return success ? MEMCG_LRU_YOUNG : 0;
+}
+
+#ifdef CONFIG_MEMCG
+
+static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
+{
+	int gen;
+	int bin;
+	int first_bin;
+	struct lruvec *lruvec;
+	struct lru_gen_folio *lrugen;
+	const struct hlist_nulls_node *pos;
+	int op = 0;
+	struct mem_cgroup *memcg = NULL;
+	unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
+
+	bin = first_bin = prandom_u32_max(MEMCG_NR_BINS);
+restart:
+	gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq));
+
+	rcu_read_lock();
+
+	hlist_nulls_for_each_entry_rcu(lrugen, pos, &pgdat->memcg_lru.fifo[gen][bin], list) {
+		if (op)
+			lru_gen_rotate_memcg(lruvec, op);
+
+		mem_cgroup_put(memcg);
+
+		lruvec = container_of(lrugen, struct lruvec, lrugen);
+		memcg = lruvec_memcg(lruvec);
+
+		if (!mem_cgroup_tryget(memcg)) {
+			op = 0;
+			memcg = NULL;
+			continue;
+		}
+
+		rcu_read_unlock();
+
+		op = shrink_one(lruvec, sc);
+
+		if (sc->nr_reclaimed >= nr_to_reclaim)
+			goto success;
+
+		rcu_read_lock();
+	}
+
+	rcu_read_unlock();
+
+	/* restart if raced with lru_gen_rotate_memcg() */
+	if (gen != get_nulls_value(pos))
+		goto restart;
+
+	/* try the rest of the bins of the current generation */
+	bin = get_memcg_bin(bin + 1);
+	if (bin != first_bin)
+		goto restart;
+success:
+	if (op)
+		lru_gen_rotate_memcg(lruvec, op);
+
+	mem_cgroup_put(memcg);
+}
+
+static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
+{
+	struct blk_plug plug;
+
+	VM_WARN_ON_ONCE(global_reclaim(sc));
+
+	lru_add_drain();
+
+	blk_start_plug(&plug);
+
+	set_mm_walk(lruvec_pgdat(lruvec));
+
+	if (try_to_shrink_lruvec(lruvec, sc))
+		lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG);
+
+	clear_mm_walk();
+
+	blk_finish_plug(&plug);
+}
+
+#else /* !CONFIG_MEMCG */
+
+static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
+{
+	BUILD_BUG();
+}
+
+static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
+{
+	BUILD_BUG();
+}
+
+#endif
+
+static void set_initial_priority(struct pglist_data *pgdat, struct scan_control *sc)
+{
+	int priority;
+	unsigned long reclaimable;
+	struct lruvec *lruvec = mem_cgroup_lruvec(sc->target_mem_cgroup, pgdat);
+
+	if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH)
+		return;
+	/*
+	 * Determine the initial priority based on ((total / MEMCG_NR_GENS) >>
+	 * priority) * reclaimed_to_scanned_ratio = nr_to_reclaim, where the
+	 * estimated reclaimed_to_scanned_ratio = inactive / total.
+	 */
+	reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE);
+	if (get_swappiness(lruvec, sc))
+		reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON);
+
+	reclaimable /= MEMCG_NR_GENS;
+
+	/* round down reclaimable and round up sc->nr_to_reclaim */
+	priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1);
+
+	sc->priority = clamp(priority, 0, DEF_PRIORITY);
+}
+
+static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *sc)
+{
+	struct blk_plug plug;
+	unsigned long reclaimed = sc->nr_reclaimed;
+
+	VM_WARN_ON_ONCE(!global_reclaim(sc));
+
+	lru_add_drain();
+
+	blk_start_plug(&plug);
+
+	set_mm_walk(pgdat);
+
+	set_initial_priority(pgdat, sc);
+
+	if (current_is_kswapd())
+		sc->nr_reclaimed = 0;
+
+	if (mem_cgroup_disabled())
+		shrink_one(&pgdat->__lruvec, sc);
+	else
+		shrink_many(pgdat, sc);
+
+	if (current_is_kswapd())
+		sc->nr_reclaimed += reclaimed;
+
 	clear_mm_walk();
 
 	blk_finish_plug(&plug);
+
+	/* kswapd should never fail */
+	pgdat->kswapd_failures = 0;
+}
+
+#ifdef CONFIG_MEMCG
+void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
+{
+	int seg;
+	int old, new;
+	int bin = prandom_u32_max(MEMCG_NR_BINS);
+	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+
+	spin_lock(&pgdat->memcg_lru.lock);
+
+	VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+	seg = 0;
+	new = old = lruvec->lrugen.gen;
+
+	/* see the comment on MEMCG_NR_GENS */
+	if (op == MEMCG_LRU_HEAD)
+		seg = MEMCG_LRU_HEAD;
+	else if (op == MEMCG_LRU_TAIL)
+		seg = MEMCG_LRU_TAIL;
+	else if (op == MEMCG_LRU_OLD)
+		new = get_memcg_gen(pgdat->memcg_lru.seq);
+	else if (op == MEMCG_LRU_YOUNG)
+		new = get_memcg_gen(pgdat->memcg_lru.seq + 1);
+	else
+		VM_WARN_ON_ONCE(true);
+
+	hlist_nulls_del_rcu(&lruvec->lrugen.list);
+
+	if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD)
+		hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
+	else
+		hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
+
+	pgdat->memcg_lru.nr_memcgs[old]--;
+	pgdat->memcg_lru.nr_memcgs[new]++;
+
+	lruvec->lrugen.gen = new;
+	WRITE_ONCE(lruvec->lrugen.seg, seg);
+
+	if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq))
+		WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+	spin_unlock(&pgdat->memcg_lru.lock);
 }
+#endif
 
 /******************************************************************************
  *                          state change
@@ -5639,11 +5862,11 @@ static int run_cmd(char cmd, int memcg_id, int nid, unsigned long seq,
 
 	if (!mem_cgroup_disabled()) {
 		rcu_read_lock();
+
 		memcg = mem_cgroup_from_id(memcg_id);
-#ifdef CONFIG_MEMCG
-		if (memcg && !css_tryget(&memcg->css))
+		if (!mem_cgroup_tryget(memcg))
 			memcg = NULL;
-#endif
+
 		rcu_read_unlock();
 
 		if (!memcg)
@@ -5791,6 +6014,19 @@ void lru_gen_init_lruvec(struct lruvec *lruvec)
 }
 
 #ifdef CONFIG_MEMCG
+
+void lru_gen_init_pgdat(struct pglist_data *pgdat)
+{
+	int i, j;
+
+	spin_lock_init(&pgdat->memcg_lru.lock);
+
+	for (i = 0; i < MEMCG_NR_GENS; i++) {
+		for (j = 0; j < MEMCG_NR_BINS; j++)
+			INIT_HLIST_NULLS_HEAD(&pgdat->memcg_lru.fifo[i][j], i);
+	}
+}
+
 void lru_gen_init_memcg(struct mem_cgroup *memcg)
 {
 	INIT_LIST_HEAD(&memcg->mm_list.fifo);
@@ -5814,7 +6050,69 @@ void lru_gen_exit_memcg(struct mem_cgroup *memcg)
 		}
 	}
 }
-#endif
+
+void lru_gen_online_memcg(struct mem_cgroup *memcg)
+{
+	int gen;
+	int nid;
+	int bin = prandom_u32_max(MEMCG_NR_BINS);
+
+	for_each_node(nid) {
+		struct pglist_data *pgdat = NODE_DATA(nid);
+		struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+		spin_lock(&pgdat->memcg_lru.lock);
+
+		VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+		gen = get_memcg_gen(pgdat->memcg_lru.seq);
+
+		hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]);
+		pgdat->memcg_lru.nr_memcgs[gen]++;
+
+		lruvec->lrugen.gen = gen;
+
+		spin_unlock(&pgdat->memcg_lru.lock);
+	}
+}
+
+void lru_gen_offline_memcg(struct mem_cgroup *memcg)
+{
+	int nid;
+
+	for_each_node(nid) {
+		struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+		lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD);
+	}
+}
+
+void lru_gen_release_memcg(struct mem_cgroup *memcg)
+{
+	int gen;
+	int nid;
+
+	for_each_node(nid) {
+		struct pglist_data *pgdat = NODE_DATA(nid);
+		struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+		spin_lock(&pgdat->memcg_lru.lock);
+
+		VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+		gen = lruvec->lrugen.gen;
+
+		hlist_nulls_del_rcu(&lruvec->lrugen.list);
+		pgdat->memcg_lru.nr_memcgs[gen]--;
+
+		if (!pgdat->memcg_lru.nr_memcgs[gen] && gen == get_memcg_gen(pgdat->memcg_lru.seq))
+			WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+		spin_unlock(&pgdat->memcg_lru.lock);
+	}
+}
+
+#endif /* CONFIG_MEMCG */
 
 static int __init init_lru_gen(void)
 {
@@ -5841,6 +6139,10 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 {
 }
 
+static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *sc)
+{
+}
+
 #endif /* CONFIG_LRU_GEN */
 
 static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
@@ -5854,7 +6156,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
 	bool proportional_reclaim;
 	struct blk_plug plug;
 
-	if (lru_gen_enabled()) {
+	if (lru_gen_enabled() && !global_reclaim(sc)) {
 		lru_gen_shrink_lruvec(lruvec, sc);
 		return;
 	}
@@ -6097,6 +6399,11 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 	struct lruvec *target_lruvec;
 	bool reclaimable = false;
 
+	if (lru_gen_enabled() && global_reclaim(sc)) {
+		lru_gen_shrink_node(pgdat, sc);
+		return;
+	}
+
 	target_lruvec = mem_cgroup_lruvec(sc->target_mem_cgroup, pgdat);
 
 again:
-- 
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH mm-unstable v1 7/8] mm: multi-gen LRU: clarify scan_control flags
  2022-12-01 22:39 [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU Yu Zhao
                   ` (5 preceding siblings ...)
  2022-12-01 22:39 ` [PATCH mm-unstable v1 6/8] mm: multi-gen LRU: per-node lru_gen_folio lists Yu Zhao
@ 2022-12-01 22:39 ` Yu Zhao
  2022-12-02  4:17   ` Hillf Danton
  2022-12-01 22:39 ` [PATCH mm-unstable v1 8/8] mm: multi-gen LRU: simplify arch_has_hw_pte_young() check Yu Zhao
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Yu Zhao @ 2022-12-01 22:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Jonathan Corbet, Michael Larabel, Michal Hocko,
	Mike Rapoport, Roman Gushchin, Suren Baghdasaryan, linux-mm,
	linux-kernel, linux-mm, Yu Zhao

Among the flags in scan_control:
1. sc->may_swap, which indicates swap constraint due to memsw.max, is
   supported as usual.
2. sc->proactive, which indicates reclaim by memory.reclaim, may not
   opportunistically skip the aging path, since it is considered less
   latency sensitive.
3. !(sc->gfp_mask & __GFP_IO), which indicates IO constraint,
   prioritizes file LRU, since clean file folios are more likely to
   exist.
4. sc->may_writepage and sc->may_unmap, which indicates opportunistic
   reclaim, are rejected, since unmapped clean folios are already
   prioritized. Scanning for more of them is likely futile and can
   cause high reclaim latency when there is a large number of memcgs.

The rest are handled by the existing code.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/vmscan.c | 53 +++++++++++++++++++++++++++--------------------------
 1 file changed, 27 insertions(+), 26 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 44506eb96c9d..39724e7ae837 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3191,6 +3191,9 @@ static int get_swappiness(struct lruvec *lruvec, struct scan_control *sc)
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 
+	if (!sc->may_swap)
+		return 0;
+
 	if (!can_demote(pgdat->node_id, sc) &&
 	    mem_cgroup_get_nr_swap_pages(memcg) < MIN_LRU_BATCH)
 		return 0;
@@ -4220,7 +4223,7 @@ static void walk_mm(struct lruvec *lruvec, struct mm_struct *mm, struct lru_gen_
 	} while (err == -EAGAIN);
 }
 
-static struct lru_gen_mm_walk *set_mm_walk(struct pglist_data *pgdat)
+static struct lru_gen_mm_walk *set_mm_walk(struct pglist_data *pgdat, bool force_alloc)
 {
 	struct lru_gen_mm_walk *walk = current->reclaim_state->mm_walk;
 
@@ -4228,7 +4231,7 @@ static struct lru_gen_mm_walk *set_mm_walk(struct pglist_data *pgdat)
 		VM_WARN_ON_ONCE(walk);
 
 		walk = &pgdat->mm_walk;
-	} else if (!pgdat && !walk) {
+	} else if (!walk && force_alloc) {
 		VM_WARN_ON_ONCE(current_is_kswapd());
 
 		walk = kzalloc(sizeof(*walk), __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
@@ -4414,7 +4417,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
 		goto done;
 	}
 
-	walk = set_mm_walk(NULL);
+	walk = set_mm_walk(NULL, true);
 	if (!walk) {
 		success = iterate_mm_list_nowalk(lruvec, max_seq);
 		goto done;
@@ -4483,8 +4486,6 @@ static bool lruvec_is_reclaimable(struct lruvec *lruvec, struct scan_control *sc
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 	DEFINE_MIN_SEQ(lruvec);
 
-	VM_WARN_ON_ONCE(sc->memcg_low_reclaim);
-
 	/* see the comment on lru_gen_folio */
 	gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
 	birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
@@ -4740,12 +4741,8 @@ static bool isolate_folio(struct lruvec *lruvec, struct folio *folio, struct sca
 {
 	bool success;
 
-	/* unmapping inhibited */
-	if (!sc->may_unmap && folio_mapped(folio))
-		return false;
-
 	/* swapping inhibited */
-	if (!(sc->may_writepage && (sc->gfp_mask & __GFP_IO)) &&
+	if (!(sc->gfp_mask & __GFP_IO) &&
 	    (folio_test_dirty(folio) ||
 	     (folio_test_anon(folio) && !folio_test_swapcache(folio))))
 		return false;
@@ -4842,9 +4839,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
 	__count_vm_events(PGSCAN_ANON + type, isolated);
 
 	/*
-	 * There might not be eligible pages due to reclaim_idx, may_unmap and
-	 * may_writepage. Check the remaining to prevent livelock if it's not
-	 * making progress.
+	 * There might not be eligible folios due to reclaim_idx. Check the
+	 * remaining to prevent livelock if it's not making progress.
 	 */
 	return isolated || !remaining ? scanned : 0;
 }
@@ -5104,8 +5100,7 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 	DEFINE_MAX_SEQ(lruvec);
 
-	if (mem_cgroup_below_min(memcg) ||
-	    (mem_cgroup_below_low(memcg) && !sc->memcg_low_reclaim))
+	if (mem_cgroup_below_min(memcg))
 		return 0;
 
 	if (!should_run_aging(lruvec, max_seq, sc, can_swap, &nr_to_scan))
@@ -5133,17 +5128,14 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
 	long nr_to_scan;
 	unsigned long scanned = 0;
 	unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
+	int swappiness = get_swappiness(lruvec, sc);
+
+	/* clean file folios are more likely to exist */
+	if (swappiness && !(sc->gfp_mask & __GFP_IO))
+		swappiness = 1;
 
 	while (true) {
 		int delta;
-		int swappiness;
-
-		if (sc->may_swap)
-			swappiness = get_swappiness(lruvec, sc);
-		else if (!cgroup_reclaim(sc) && get_swappiness(lruvec, sc))
-			swappiness = 1;
-		else
-			swappiness = 0;
 
 		nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
 		if (nr_to_scan <= 0)
@@ -5274,12 +5266,13 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 	struct blk_plug plug;
 
 	VM_WARN_ON_ONCE(global_reclaim(sc));
+	VM_WARN_ON_ONCE(!sc->may_writepage || !sc->may_unmap);
 
 	lru_add_drain();
 
 	blk_start_plug(&plug);
 
-	set_mm_walk(lruvec_pgdat(lruvec));
+	set_mm_walk(NULL, sc->proactive);
 
 	if (try_to_shrink_lruvec(lruvec, sc))
 		lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG);
@@ -5335,11 +5328,19 @@ static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *
 
 	VM_WARN_ON_ONCE(!global_reclaim(sc));
 
+	/*
+	 * Unmapped clean folios are already prioritized. Scanning for more of
+	 * them is likely futile and can cause high reclaim latency when there
+	 * is a large number of memcgs.
+	 */
+	if (!sc->may_writepage || !sc->may_unmap)
+		return;
+
 	lru_add_drain();
 
 	blk_start_plug(&plug);
 
-	set_mm_walk(pgdat);
+	set_mm_walk(pgdat, sc->proactive);
 
 	set_initial_priority(pgdat, sc);
 
@@ -5926,7 +5927,7 @@ static ssize_t lru_gen_seq_write(struct file *file, const char __user *src,
 	set_task_reclaim_state(current, &sc.reclaim_state);
 	flags = memalloc_noreclaim_save();
 	blk_start_plug(&plug);
-	if (!set_mm_walk(NULL)) {
+	if (!set_mm_walk(NULL, true)) {
 		err = -ENOMEM;
 		goto done;
 	}
-- 
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH mm-unstable v1 8/8] mm: multi-gen LRU: simplify arch_has_hw_pte_young() check
  2022-12-01 22:39 [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU Yu Zhao
                   ` (6 preceding siblings ...)
  2022-12-01 22:39 ` [PATCH mm-unstable v1 7/8] mm: multi-gen LRU: clarify scan_control flags Yu Zhao
@ 2022-12-01 22:39 ` Yu Zhao
  2022-12-20 21:49 ` JavaScript / Ampere Altra benchmark with MGLRU Yu Zhao
  2022-12-21  0:07 ` Java / POWER9 " Yu Zhao
  9 siblings, 0 replies; 14+ messages in thread
From: Yu Zhao @ 2022-12-01 22:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Jonathan Corbet, Michael Larabel, Michal Hocko,
	Mike Rapoport, Roman Gushchin, Suren Baghdasaryan, linux-mm,
	linux-kernel, linux-mm, Yu Zhao

Scanning page tables when hardware does not set the accessed bit has
no real use cases.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 39724e7ae837..5994592c55fd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4412,7 +4412,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
 	 * handful of PTEs. Spreading the work out over a period of time usually
 	 * is less efficient, but it avoids bursty page faults.
 	 */
-	if (!force_scan && !(arch_has_hw_pte_young() && get_cap(LRU_GEN_MM_WALK))) {
+	if (!arch_has_hw_pte_young() || !get_cap(LRU_GEN_MM_WALK)) {
 		success = iterate_mm_list_nowalk(lruvec, max_seq);
 		goto done;
 	}
-- 
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH mm-unstable v1 7/8] mm: multi-gen LRU: clarify scan_control flags
  2022-12-01 22:39 ` [PATCH mm-unstable v1 7/8] mm: multi-gen LRU: clarify scan_control flags Yu Zhao
@ 2022-12-02  4:17   ` Hillf Danton
  0 siblings, 0 replies; 14+ messages in thread
From: Hillf Danton @ 2022-12-02  4:17 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Andrew Morton, Johannes Weiner, Jonathan Corbet, Michael Larabel,
	Michal Hocko, Mike Rapoport, Roman Gushchin, Suren Baghdasaryan,
	linux-mm, linux-kernel, linux-mm

On 1 Dec 2022 15:39:23 -0700 Yu Zhao <yuzhao@google.com>
> Among the flags in scan_control:
> 1. sc->may_swap, which indicates swap constraint due to memsw.max, is
>    supported as usual.
> 2. sc->proactive, which indicates reclaim by memory.reclaim, may not
>    opportunistically skip the aging path, since it is considered less
>    latency sensitive.
> 3. !(sc->gfp_mask & __GFP_IO), which indicates IO constraint,
>    prioritizes file LRU, since clean file folios are more likely to
>    exist.
> 4. sc->may_writepage and sc->may_unmap, which indicates opportunistic
>    reclaim, are rejected, since unmapped clean folios are already
>    prioritized. Scanning for more of them is likely futile and can
>    cause high reclaim latency when there is a large number of memcgs.

Nit, just because of gfp without __GFP_IO set does not mean there are
likely more clean page caches, though prioritized, on the local numa node
than a remote one, and vice verse.

Hillf

/**
 * memalloc_noio_save - Marks implicit GFP_NOIO allocation scope.
 *
 * This functions marks the beginning of the GFP_NOIO allocation scope.
 * All further allocations will implicitly drop __GFP_IO flag and so
 * they are safe for the IO critical section from the allocation recursion
 * point of view. Use memalloc_noio_restore to end the scope with flags
 * returned by this function.
 *
 * This function is safe to be used from any context.
 */


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mm-unstable v1 6/8] mm: multi-gen LRU: per-node lru_gen_folio lists
  2022-12-01 22:39 ` [PATCH mm-unstable v1 6/8] mm: multi-gen LRU: per-node lru_gen_folio lists Yu Zhao
@ 2022-12-03  4:20   ` Hillf Danton
  0 siblings, 0 replies; 14+ messages in thread
From: Hillf Danton @ 2022-12-03  4:20 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Andrew Morton, Johannes Weiner, Jonathan Corbet, Michael Larabel,
	Michal Hocko, Mike Rapoport, Roman Gushchin, Suren Baghdasaryan,
	linux-mm, linux-kernel, linux-mm

On 1 Dec 2022 15:39:22 -0700 Yu Zhao <yuzhao@google.com>
> @@ -477,6 +477,16 @@ static void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid)
>  	struct mem_cgroup_per_node *mz;
>  	struct mem_cgroup_tree_per_node *mctz;
>  
> +	if (lru_gen_enabled()) {
> +		struct lruvec *lruvec = &memcg->nodeinfo[nid]->lruvec;
> +
> +		/* see the comment on MEMCG_NR_GENS */
> +		if (soft_limit_excess(memcg) && lru_gen_memcg_seg(lruvec) != MEMCG_LRU_HEAD)
> +			lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD);
> +
> +		return;

The heuristic of rotation is so weak a signal in the background noise
produced by prandom_u32_max(MEMCG_NR_BINS), wonder if mcgroup lru works
no fine without it.

> +	}
> +
>  	mctz = soft_limit_tree.rb_tree_per_node[nid];
>  	if (!mctz)
>  		return;
> @@ -3526,6 +3536,9 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
>  	struct mem_cgroup_tree_per_node *mctz;
>  	unsigned long excess;
>  
> +	if (lru_gen_enabled())
> +		return 0;
> +
>  	if (order > 0)
>  		return 0;
>  


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mm-unstable v1 3/8] mm: multi-gen LRU: remove eviction fairness safeguard
  2022-12-01 22:39 ` [PATCH mm-unstable v1 3/8] mm: multi-gen LRU: remove eviction fairness safeguard Yu Zhao
@ 2022-12-11  3:59   ` Chen Wandun
  0 siblings, 0 replies; 14+ messages in thread
From: Chen Wandun @ 2022-12-11  3:59 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Johannes Weiner, Jonathan Corbet, Michael Larabel, Michal Hocko,
	Mike Rapoport, Roman Gushchin, Suren Baghdasaryan, linux-mm,
	linux-kernel, linux-mm, Andrew Morton



在 2022/12/2 6:39, Yu Zhao 写道:
> Recall that the eviction consumes the oldest generation: first it
> bucket-sorts folios whose gen counters were updated by the aging and
> reclaims the rest; then it increments lrugen->min_seq.
>
> The current eviction fairness safeguard for global reclaim has a
> dilemma: when there are multiple eligible memcgs, should it continue
> or stop upon meeting the reclaim goal? If it continues, it overshoots
> and increases direct reclaim latency; if it stops, it loses fairness
> between memcgs it has taken memory away from and those it has yet to.
>
> With memcg LRU, the eviction, while ensuring eventual fairness, will
> stop upon meeting its goal. Therefore the current eviction fairness
> safeguard for global reclaim will not be needed.
>
> Note that memcg LRU only applies to global reclaim. For memcg reclaim,
> the eviction will continue, even if it is overshooting. This becomes
> unconditional due to code simplification.
>
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> ---
>   mm/vmscan.c | 81 +++++++++++++++--------------------------------------
>   1 file changed, 23 insertions(+), 58 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ebab1ec3d400..d714a777c88b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -449,6 +449,11 @@ static bool cgroup_reclaim(struct scan_control *sc)
>   	return sc->target_mem_cgroup;
>   }
>   
> +static bool global_reclaim(struct scan_control *sc)
> +{
> +	return !sc->target_mem_cgroup || mem_cgroup_is_root(sc->target_mem_cgroup);
> +}
> +
>   /**
>    * writeback_throttling_sane - is the usual dirty throttling mechanism available?
>    * @sc: scan_control in question
> @@ -499,6 +504,11 @@ static bool cgroup_reclaim(struct scan_control *sc)
>   	return false;
>   }
>   
> +static bool global_reclaim(struct scan_control *sc)
> +{
> +	return true;
> +}
> +
>   static bool writeback_throttling_sane(struct scan_control *sc)
>   {
>   	return true;
> @@ -4991,8 +5001,7 @@ static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int sw
>   	return scanned;
>   }
>   
> -static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness,
> -			bool *need_swapping)
> +static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness)
>   {
>   	int type;
>   	int scanned;
> @@ -5081,9 +5090,6 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>   		goto retry;
>   	}
>   
> -	if (need_swapping && type == LRU_GEN_ANON)
> -		*need_swapping = true;
> -
>   	return scanned;
>   }
>   
> @@ -5122,67 +5128,26 @@ static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *
>   	return min_seq[!can_swap] + MIN_NR_GENS <= max_seq ? nr_to_scan : 0;
>   }
>   
> -static bool should_abort_scan(struct lruvec *lruvec, unsigned long seq,
> -			      struct scan_control *sc, bool need_swapping)
> +static unsigned long get_nr_to_reclaim(struct scan_control *sc)
>   {
> -	int i;
> -	DEFINE_MAX_SEQ(lruvec);
> +	/* don't abort memcg reclaim to ensure fairness */
> +	if (!global_reclaim(sc))
> +		return -1;
The return type of the function is unsigned long. Does the return of - 1 
mean something else?
>   
> -	if (!current_is_kswapd()) {
> -		/* age each memcg at most once to ensure fairness */
> -		if (max_seq - seq > 1)
> -			return true;
> +	/* discount the previous progress for kswapd */
> +	if (current_is_kswapd())
> +		return sc->nr_to_reclaim + sc->last_reclaimed;
>   
> -		/* over-swapping can increase allocation latency */
> -		if (sc->nr_reclaimed >= sc->nr_to_reclaim && need_swapping)
> -			return true;
> -
> -		/* give this thread a chance to exit and free its memory */
> -		if (fatal_signal_pending(current)) {
> -			sc->nr_reclaimed += MIN_LRU_BATCH;
> -			return true;
> -		}
> -
> -		if (cgroup_reclaim(sc))
> -			return false;
> -	} else if (sc->nr_reclaimed - sc->last_reclaimed < sc->nr_to_reclaim)
> -		return false;
> -
> -	/* keep scanning at low priorities to ensure fairness */
> -	if (sc->priority > DEF_PRIORITY - 2)
> -		return false;
> -
> -	/*
> -	 * A minimum amount of work was done under global memory pressure. For
> -	 * kswapd, it may be overshooting. For direct reclaim, the allocation
> -	 * may succeed if all suitable zones are somewhat safe. In either case,
> -	 * it's better to stop now, and restart later if necessary.
> -	 */
> -	for (i = 0; i <= sc->reclaim_idx; i++) {
> -		unsigned long wmark;
> -		struct zone *zone = lruvec_pgdat(lruvec)->node_zones + i;
> -
> -		if (!managed_zone(zone))
> -			continue;
> -
> -		wmark = current_is_kswapd() ? high_wmark_pages(zone) : low_wmark_pages(zone);
> -		if (wmark > zone_page_state(zone, NR_FREE_PAGES))
> -			return false;
> -	}
> -
> -	sc->nr_reclaimed += MIN_LRU_BATCH;
> -
> -	return true;
> +	return max(sc->nr_to_reclaim, compact_gap(sc->order));
>   }
>   
>   static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
>   {
>   	struct blk_plug plug;
>   	bool need_aging = false;
> -	bool need_swapping = false;
>   	unsigned long scanned = 0;
>   	unsigned long reclaimed = sc->nr_reclaimed;
> -	DEFINE_MAX_SEQ(lruvec);
> +	unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
>   
>   	lru_add_drain();
>   
> @@ -5206,7 +5171,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
>   		if (!nr_to_scan)
>   			goto done;
>   
> -		delta = evict_folios(lruvec, sc, swappiness, &need_swapping);
> +		delta = evict_folios(lruvec, sc, swappiness);
>   		if (!delta)
>   			goto done;
>   
> @@ -5214,7 +5179,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
>   		if (scanned >= nr_to_scan)
>   			break;
>   
> -		if (should_abort_scan(lruvec, max_seq, sc, need_swapping))
> +		if (sc->nr_reclaimed >= nr_to_reclaim)
>   			break;
>   
>   		cond_resched();
> @@ -5661,7 +5626,7 @@ static int run_eviction(struct lruvec *lruvec, unsigned long seq, struct scan_co
>   		if (sc->nr_reclaimed >= nr_to_reclaim)
>   			return 0;
>   
> -		if (!evict_folios(lruvec, sc, swappiness, NULL))
> +		if (!evict_folios(lruvec, sc, swappiness))
>   			return 0;
>   
>   		cond_resched();


^ permalink raw reply	[flat|nested] 14+ messages in thread

* JavaScript / Ampere Altra benchmark with MGLRU
  2022-12-01 22:39 [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU Yu Zhao
                   ` (7 preceding siblings ...)
  2022-12-01 22:39 ` [PATCH mm-unstable v1 8/8] mm: multi-gen LRU: simplify arch_has_hw_pte_young() check Yu Zhao
@ 2022-12-20 21:49 ` Yu Zhao
  2022-12-21  0:07 ` Java / POWER9 " Yu Zhao
  9 siblings, 0 replies; 14+ messages in thread
From: Yu Zhao @ 2022-12-20 21:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Darren Hart, Shijie Huang, Jonathan Corbet,
	Linus Torvalds, Michael Larabel, linux-mm

TLDR
====
Simulated the client-server model with Chromium and Node.js pairs:
ran 1 pair per CPU and collected the number of requests (responses)
for each pair over 24 hours. (Swapped less than 10% of DRAM.)

Throughput (number of requests)       MGLRU off    MGLRU on    Change
---------------------------------------------------------------------
Total                                 9490720      16417902    +88%
Node 0                                5609841      8706039     +55%
Node 1                                3880879      7711863     +98%

Tail latency (number of requests)     MGLRU off    MGLRU on    Change
---------------------------------------------------------------------
[128s, inf)                           7            0           NaN
[64s, 128s)                           98           0           NaN
[32s, 64s)                            378          0           NaN
[16s, 32s)                            78112        416         -99%

Fairness (SD over mean requests)      MGLRU off    MGLRU on    Change
---------------------------------------------------------------------
Node 0                                12%          2%          -83%
Node 1                                9%           2%          -77%

Abbreviations
=============
CI:   confidence interval
NS:   no statistically significant difference
DUT:  device under test
ATE:  automatic test equipment

Rational
========
1. JavaScript has been mostly the most popular programming language
   for the last decade, ranking by pull requests on GitHub [1], and
   "completes its ninth year in a row as the most commonly used
   programming language" according to Stack Overflow [2].
2. ARM has the highest growth rate in the server segment. Google
   Cloud Platform [3], Microsoft Azure and Oracle offer Ampere Altra
   processors as the x86 alternative.
3. Chrome is the most used browser for web application testing,
   according to the 2021 Selenium survey [4]. Selenium is the
   standard tool for web application test automation.
4. Node.js is the standard backend JavaScript runtime environment,
   offered by all major cloud providers.

Hardware
========
DUT $ lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 128
  On-line CPU(s) list:  0-127
Vendor ID:              ARM
  Model name:           Neoverse-N1
    Model:              1
    Thread(s) per core: 1
    Core(s) per socket: 64
    Socket(s):          2
    Stepping:           r3p1
    Frequency boost:    disabled
    CPU max MHz:        2800.0000
    CPU min MHz:        1000.0000
    BogoMIPS:           50.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
Caches (sum of all):
  L1d:                  8 MiB (128 instances)
  L1i:                  8 MiB (128 instances)
  L2:                   128 MiB (128 instances)
NUMA:
  NUMA node(s):         2
  NUMA node0 CPU(s):    0-63
  NUMA node1 CPU(s):    64-127
Vulnerabilities:
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Mmio stale data:      Not affected
  Retbleed:             Not affected
  Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Mitigation; CSV2, BHB
  Srbds:                Not affected
  Tsx async abort:      Not affected

DUT $ numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 0 size: 257730 MB
node 0 free: 256560 MB
node 1 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
node 1 size: 256877 MB
node 1 free: 256117 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

DUT $ cat /sys/class/nvme/nvme0/model
INTEL SSDPF21Q800GB

DUT $ cat /sys/class/nvme/nvme0/numa_node
0

DUT $ cat /sys/class/nvme/nvme1/model
INTEL SSDPF21Q800GB

DUT $ cat /sys/class/nvme/nvme1/numa_node
1

Software
========
DUT $ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"

DUT $ uname -a
Linux arm 6.1.0-rc8+ #1 SMP Sat Dec 10 23:34:43 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

DUT $ cat /proc/swaps
Filename        Type         Size         Used  Priority
/dev/nvme0n1    partition    268435392    0     -2
/dev/nvme1n1    partition    268435392    0     -3

DUT $ node -v
v12.22.9

DUT $ $ chromedriver -v
ChromeDriver 105.0.5195.102 (4c16f5ffcc2da70ee2600d5db77bed423ac03a5a-refs/branch-heads/5195_55@{#4})

DUT $ python3 -c "import selenium; print(selenium.__version__)"
4.0.0a1

Procedure
=========
DUT $ cat server.js
const chunks = 8;
const size = 1024 * 1024 * 512;
const stride = 512;

const bufs = [];

for (let i = 0; i < chunks; i++) {
    bufs[i] = Buffer.alloc(size);
}

const http = require('http');

const server = http.createServer(function(req, res) {
    if (req.url != '/') {
        res.writeHead(404);
        res.end();
        return;
    }

    const rand = Math.floor(Math.random() * chunks);

    const buf = bufs[rand];
    for (let i = 0; i < buf.length; i += stride) {
        buf[i] = i;
    }

    const html = `<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<link rel="icon" href="data:,">
<title>memory test</title>
</head>
<body>
<div id="size">${buf.length}</div>
</body>
</html>`;

    res.writeHead(200, {'Content-Type': 'text/html'});
    res.end(html);
}).listen(process.argv[2]);

function exit(sig) {
    server.close(function() {
        process.exit(0);
    });
}

process.on('SIGINT', exit);

DUT $ cat client.py
import signal
import sys
import time
from selenium import webdriver

secs = [0, 1, 2, 4, 8, 16, 32, 64, 128]
hist = dict()

js = '''
const chunks = 4;
const size = 1024 * 128;
const stride = 128;

const rand = Math.floor(Math.random() * chunks);

const buf = new BigInt64Array(size * (chunks + 1));
for (let i = 0; i < buf.length; i += stride) {
    buf[i] = BigInt(i);
}

document.getElementById("size").innerHTML = "0";

return buf.length;
'''


def stop(sig, stack):
    raise KeyboardInterrupt

signal.signal(signal.SIGINT, stop)

try:
    options = webdriver.chrome.options.Options()
    options.headless = True

    driver = webdriver.Chrome(options=options)
    driver.set_script_timeout(600)
    driver.set_page_load_timeout(600)

    driver.get('http://127.0.0.1:' + sys.argv[1])

    for sec in secs:
        hist[sec] = 0

    while True:
        start = time.time()

        driver.refresh()

        size = int(driver.find_element_by_id('size').text)
        assert size > 0

        size = driver.execute_script(js)
        assert size > 0

        elapsed = time.time() - start

        for sec in reversed(secs):
            if elapsed >= sec:
                hist[sec] += 1
                break
except KeyboardInterrupt:
    print('client:', sys.argv[1], 'total: %6d,' % sum(hist.values()),
          ', '.join('%d: %6d' % (k, v) for k, v in hist.items()))

DUT $ cat js_benchmark.sh
echo 0 >/proc/sys/kernel/numa_balancing

nodes=2
memcgs=64

run() {
    trap 'wait' SIGINT

    memcg=$1
    path=/sys/fs/cgroup/memcg$memcg

    mkdir $path
    echo $BASHPID >$path/cgroup.procs

    for ((node = 0; node < $nodes; node++)); do
        port=$((nodes * memcg + node + 8000))

        numactl -N $node -m $node node server.js $port &
    done

    sleep 60

    for ((node = 0; node < $nodes; node++)); do
        port=$((nodes * memcg + node + 8000))

        numactl -N $node -m $node python3 client.py $port &
    done

    wait
}

for ((memcg = 0; memcg < $memcgs; memcg++)); do
    run $memcg &
done

sleep $((24 * 60 * 60))
trap 'wait' SIGINT
kill -INT 0

Results
=======
MGLRU off
---------
client: 8000 total:  97642, 0: 78143, 1: 17483, 2:  956, 4: 537, 8:  338, 16:  184, 32:  0, 64: 1, 128: 0
client: 8001 total:  55368, 0: 26647, 1: 25128, 2: 1618, 4: 573, 8:  544, 16:  856, 32:  1, 64: 1, 128: 0
client: 8002 total:  81680, 0: 65911, 1: 13180, 2:  778, 4: 532, 8:  498, 16:  779, 32:  2, 64: 0, 128: 0
client: 8003 total:  67382, 0: 32878, 1: 30980, 2: 2096, 4: 713, 8:  434, 16:  271, 32:  9, 64: 1, 128: 0
client: 8004 total:  72633, 0: 58276, 1: 11452, 2:  574, 4: 491, 8:  753, 16: 1086, 32:  0, 64: 1, 128: 0
client: 8005 total:  63859, 0: 32720, 1: 27363, 2: 2152, 4: 620, 8:  495, 16:  507, 32:  1, 64: 1, 128: 0
client: 8006 total:  99194, 0: 77990, 1: 19202, 2: 1008, 4: 567, 8:  329, 16:   97, 32:  1, 64: 0, 128: 0
client: 8007 total:  54722, 0: 25148, 1: 26107, 2: 1463, 4: 542, 8:  545, 16:  915, 32:  1, 64: 1, 128: 0
client: 8008 total:  92282, 0: 75384, 1: 14623, 2:  917, 4: 594, 8:  359, 16:  404, 32:  0, 64: 1, 128: 0
client: 8009 total:  59080, 0: 28775, 1: 26968, 2: 1591, 4: 567, 8:  438, 16:  732, 32:  8, 64: 1, 128: 0
client: 8010 total:  94372, 0: 74096, 1: 18259, 2:  903, 4: 520, 8:  343, 16:  250, 32:  0, 64: 1, 128: 0
client: 8011 total:  68429, 0: 32227, 1: 32569, 2: 2261, 4: 734, 8:  479, 16:  155, 32:  3, 64: 1, 128: 0
client: 8012 total:  87174, 0: 71272, 1: 13505, 2:  867, 4: 528, 8:  432, 16:  569, 32:  1, 64: 0, 128: 0
client: 8013 total:  64894, 0: 31692, 1: 29822, 2: 1837, 4: 682, 8:  435, 16:  424, 32:  1, 64: 1, 128: 0
client: 8014 total: 101275, 0: 81409, 1: 18076, 2:  949, 4: 530, 8:  268, 16:   42, 32:  0, 64: 1, 128: 0
client: 8015 total:  52166, 0: 27647, 1: 20893, 2: 1442, 4: 628, 8:  501, 16: 1040, 32: 15, 64: 0, 128: 0
client: 8016 total: 100955, 0: 79030, 1: 19997, 2: 1061, 4: 561, 8:  258, 16:   46, 32:  2, 64: 0, 128: 0
client: 8017 total:  55103, 0: 25360, 1: 26264, 2: 1514, 4: 556, 8:  483, 16:  925, 32:  0, 64: 1, 128: 0
client: 8018 total: 101058, 0: 80212, 1: 19107, 2:  923, 4: 534, 8:  255, 16:   25, 32:  1, 64: 1, 128: 0
client: 8019 total:  60505, 0: 29718, 1: 27155, 2: 1870, 4: 698, 8:  463, 16:  590, 32: 10, 64: 1, 128: 0
client: 8020 total:  88272, 0: 72067, 1: 13782, 2:  886, 4: 557, 8:  413, 16:  565, 32:  2, 64: 0, 128: 0
client: 8021 total:  61741, 0: 29676, 1: 28732, 2: 1694, 4: 591, 8:  457, 16:  587, 32:  3, 64: 1, 128: 0
client: 8022 total:  89918, 0: 73469, 1: 14042, 2:  903, 4: 555, 8:  447, 16:  501, 32:  0, 64: 1, 128: 0
client: 8023 total:  52771, 0: 26388, 1: 22833, 2: 1391, 4: 570, 8:  574, 16: 1009, 32:  5, 64: 1, 128: 0
client: 8024 total:  90558, 0: 71814, 1: 16613, 2:  852, 4: 501, 8:  320, 16:  456, 32:  1, 64: 1, 128: 0
client: 8025 total:  67534, 0: 32627, 1: 31357, 2: 2118, 4: 733, 8:  450, 16:  240, 32:  8, 64: 1, 128: 0
client: 8026 total:  89103, 0: 73391, 1: 13394, 2:  857, 4: 541, 8:  380, 16:  539, 32:  0, 64: 1, 128: 0
client: 8027 total:  64156, 0: 31064, 1: 29836, 2: 1774, 4: 610, 8:  372, 16:  497, 32:  2, 64: 0, 128: 1
client: 8028 total:  74099, 0: 59732, 1: 11598, 2:  517, 4: 456, 8:  689, 16: 1105, 32:  1, 64: 1, 128: 0
client: 8029 total:  58326, 0: 31863, 1: 22899, 2: 1652, 4: 578, 8:  468, 16:  857, 32:  9, 64: 0, 128: 0
client: 8030 total:  88146, 0: 71157, 1: 14618, 2:  881, 4: 560, 8:  418, 16:  511, 32:  0, 64: 1, 128: 0
client: 8031 total:  71225, 0: 33152, 1: 34738, 2: 2206, 4: 654, 8:  388, 16:   86, 32:  0, 64: 1, 128: 0
client: 8032 total:  86962, 0: 71089, 1: 13536, 2:  814, 4: 525, 8:  369, 16:  628, 32:  0, 64: 1, 128: 0
client: 8033 total:  65891, 0: 32062, 1: 30367, 2: 1961, 4: 709, 8:  459, 16:  321, 32: 11, 64: 1, 128: 0
client: 8034 total:  64344, 0: 53445, 1:  7714, 2:  569, 4: 445, 8:  617, 16: 1553, 32:  1, 64: 0, 128: 0
client: 8035 total:  61267, 0: 33487, 1: 24680, 2: 1397, 4: 562, 8:  443, 16:  680, 32: 18, 64: 0, 128: 0
client: 8036 total:  72258, 0: 62071, 1:  7165, 2:  659, 4: 481, 8:  488, 16: 1393, 32:  0, 64: 1, 128: 0
client: 8037 total:  49430, 0: 24902, 1: 21283, 2:  940, 4: 442, 8:  548, 16: 1314, 32:  0, 64: 1, 128: 0
client: 8038 total:  74546, 0: 60982, 1: 10722, 2:  664, 4: 484, 8:  532, 16: 1161, 32:  0, 64: 1, 128: 0
client: 8039 total:  61388, 0: 32670, 1: 25228, 2: 1696, 4: 662, 8:  499, 16:  623, 32: 10, 64: 0, 128: 0
client: 8040 total:  72193, 0: 60204, 1:  9102, 2:  633, 4: 474, 8:  567, 16: 1212, 32:  0, 64: 1, 128: 0
client: 8041 total:  70788, 0: 34450, 1: 32963, 2: 2150, 4: 681, 8:  409, 16:  133, 32:  1, 64: 0, 128: 1
client: 8042 total:  99573, 0: 79079, 1: 18589, 2: 1006, 4: 549, 8:  291, 16:   57, 32:  2, 64: 0, 128: 0
client: 8043 total:  61150, 0: 30443, 1: 27235, 2: 1833, 4: 586, 8:  451, 16:  589, 32: 13, 64: 0, 128: 0
client: 8044 total:  92953, 0: 73432, 1: 17453, 2:  858, 4: 529, 8:  320, 16:  360, 32:  0, 64: 1, 128: 0
client: 8045 total:  67159, 0: 31960, 1: 31586, 2: 2243, 4: 657, 8:  444, 16:  266, 32:  2, 64: 1, 128: 0
client: 8046 total:  89958, 0: 72951, 1: 14700, 2:  917, 4: 528, 8:  371, 16:  490, 32:  1, 64: 0, 128: 0
client: 8047 total:  64967, 0: 30897, 1: 30717, 2: 1828, 4: 638, 8:  499, 16:  386, 32:  1, 64: 1, 128: 0
client: 8048 total:  74985, 0: 62435, 1:  9756, 2:  697, 4: 447, 8:  483, 16: 1167, 32:  0, 64: 0, 128: 0
client: 8049 total:  68061, 0: 34043, 1: 30689, 2: 1923, 4: 704, 8:  427, 16:  268, 32:  6, 64: 1, 128: 0
client: 8050 total:  92109, 0: 74836, 1: 15038, 2:  887, 4: 561, 8:  395, 16:  391, 32:  0, 64: 1, 128: 0
client: 8051 total:  61302, 0: 29354, 1: 28320, 2: 1873, 4: 678, 8:  541, 16:  530, 32:  5, 64: 1, 128: 0
client: 8052 total:  93586, 0: 75682, 1: 15680, 2:  954, 4: 606, 8:  353, 16:  309, 32:  2, 64: 0, 128: 0
client: 8053 total:  62111, 0: 29798, 1: 28888, 2: 1761, 4: 649, 8:  478, 16:  534, 32:  2, 64: 1, 128: 0
client: 8054 total:  77598, 0: 63967, 1: 10939, 2:  565, 4: 449, 8:  656, 16: 1021, 32:  0, 64: 1, 128: 0
client: 8055 total:  52696, 0: 29572, 1: 19464, 2: 1470, 4: 573, 8:  496, 16: 1108, 32: 12, 64: 1, 128: 0
client: 8056 total: 101806, 0: 79977, 1: 19972, 2: 1061, 4: 515, 8:  245, 16:   35, 32:  0, 64: 1, 128: 0
client: 8057 total:  51714, 0: 23892, 1: 24298, 2: 1316, 4: 577, 8:  556, 16: 1074, 32:  0, 64: 1, 128: 0
client: 8058 total:  99891, 0: 78935, 1: 19039, 2: 1029, 4: 524, 8:  308, 16:   55, 32:  0, 64: 1, 128: 0
client: 8059 total:  61197, 0: 29219, 1: 28484, 2: 1797, 4: 639, 8:  476, 16:  580, 32:  1, 64: 1, 128: 0
client: 8060 total:  84812, 0: 67853, 1: 14444, 2:  822, 4: 472, 8:  457, 16:  763, 32:  0, 64: 1, 128: 0
client: 8061 total:  55538, 0: 25740, 1: 26231, 2: 1553, 4: 570, 8:  570, 16:  873, 32:  0, 64: 1, 128: 0
client: 8062 total: 100386, 0: 78818, 1: 19634, 2:  971, 4: 580, 8:  322, 16:   60, 32:  1, 64: 0, 128: 0
client: 8063 total:  54708, 0: 24943, 1: 26354, 2: 1391, 4: 561, 8:  547, 16:  909, 32:  2, 64: 1, 128: 0
client: 8064 total:  79777, 0: 66977, 1: 10038, 2:  776, 4: 535, 8:  464, 16:  986, 32:  0, 64: 1, 128: 0
client: 8065 total:  58150, 0: 27960, 1: 27001, 2: 1356, 4: 512, 8:  510, 16:  810, 32:  0, 64: 1, 128: 0
client: 8066 total:  99609, 0: 79518, 1: 18238, 2:  965, 4: 535, 8:  281, 16:   71, 32:  0, 64: 1, 128: 0
client: 8067 total:  55633, 0: 28332, 1: 23697, 2: 1602, 4: 607, 8:  546, 16:  839, 32:  9, 64: 1, 128: 0
client: 8068 total:  92506, 0: 73032, 1: 17323, 2:  910, 4: 575, 8:  347, 16:  318, 32:  0, 64: 1, 128: 0
client: 8069 total:  66166, 0: 31316, 1: 31148, 2: 2232, 4: 737, 8:  442, 16:  287, 32:  3, 64: 1, 128: 0
client: 8070 total:  73162, 0: 59514, 1: 10795, 2:  703, 4: 498, 8:  472, 16: 1179, 32:  0, 64: 1, 128: 0
client: 8071 total:  72629, 0: 35681, 1: 33578, 2: 2276, 4: 650, 8:  373, 16:   69, 32:  1, 64: 1, 128: 0
client: 8072 total:  89214, 0: 72864, 1: 14051, 2:  860, 4: 522, 8:  339, 16:  577, 32:  0, 64: 1, 128: 0
client: 8073 total:  62734, 0: 30110, 1: 29280, 2: 1779, 4: 590, 8:  445, 16:  529, 32:  0, 64: 1, 128: 0
client: 8074 total:  83958, 0: 67661, 1: 13755, 2:  820, 4: 537, 8:  459, 16:  725, 32:  0, 64: 1, 128: 0
client: 8075 total:  58355, 0: 29682, 1: 25160, 2: 1641, 4: 590, 8:  520, 16:  757, 32:  4, 64: 1, 128: 0
client: 8076 total:  77144, 0: 62866, 1: 11623, 2:  575, 4: 463, 8:  655, 16:  961, 32:  0, 64: 1, 128: 0
client: 8077 total:  60316, 0: 32464, 1: 24378, 2: 1656, 4: 650, 8:  453, 16:  699, 32: 15, 64: 1, 128: 0
client: 8078 total: 100348, 0: 79106, 1: 19344, 2: 1042, 4: 551, 8:  264, 16:   40, 32:  0, 64: 1, 128: 0
client: 8079 total:  61621, 0: 29765, 1: 28253, 2: 1938, 4: 638, 8:  464, 16:  561, 32:  1, 64: 1, 128: 0
client: 8080 total:  77042, 0: 62427, 1: 11994, 2:  560, 4: 440, 8:  620, 16: 1000, 32:  0, 64: 1, 128: 0
client: 8081 total:  64302, 0: 33050, 1: 27531, 2: 2110, 4: 680, 8:  437, 16:  490, 32:  3, 64: 1, 128: 0
client: 8082 total:  68569, 0: 56382, 1:  9139, 2:  581, 4: 454, 8:  603, 16: 1409, 32:  0, 64: 1, 128: 0
client: 8083 total:  60421, 0: 33506, 1: 23569, 2: 1533, 4: 605, 8:  462, 16:  734, 32: 11, 64: 1, 128: 0
client: 8084 total:  90413, 0: 74210, 1: 13899, 2:  878, 4: 534, 8:  382, 16:  508, 32:  2, 64: 0, 128: 0
client: 8085 total:  61547, 0: 29591, 1: 28607, 2: 1693, 4: 613, 8:  454, 16:  586, 32:  2, 64: 0, 128: 1
client: 8086 total:  73575, 0: 59394, 1: 11463, 2:  521, 4: 461, 8:  639, 16: 1096, 32:  0, 64: 1, 128: 0
client: 8087 total:  66359, 0: 34090, 1: 28727, 2: 2050, 4: 662, 8:  439, 16:  379, 32: 11, 64: 1, 128: 0
client: 8088 total: 100232, 0: 79400, 1: 18999, 2: 1007, 4: 541, 8:  242, 16:   42, 32:  0, 64: 1, 128: 0
client: 8089 total:  60029, 0: 29965, 1: 26429, 2: 1793, 4: 702, 8:  531, 16:  600, 32:  8, 64: 1, 128: 0
client: 8090 total:  83306, 0: 67111, 1: 13867, 2:  588, 4: 436, 8:  596, 16:  707, 32:  0, 64: 1, 128: 0
client: 8091 total:  63605, 0: 32317, 1: 27707, 2: 1981, 4: 613, 8:  488, 16:  490, 32:  8, 64: 1, 128: 0
client: 8092 total:  77397, 0: 63243, 1: 11578, 2:  512, 4: 446, 8:  595, 16: 1022, 32:  0, 64: 1, 128: 0
client: 8093 total:  58603, 0: 31124, 1: 23833, 2: 1712, 4: 650, 8:  539, 16:  734, 32: 10, 64: 0, 128: 1
client: 8094 total:  90750, 0: 72661, 1: 15890, 2:  861, 4: 514, 8:  377, 16:  445, 32:  2, 64: 0, 128: 0
client: 8095 total:  60656, 0: 31003, 1: 26152, 2: 1722, 4: 660, 8:  483, 16:  627, 32:  8, 64: 0, 128: 1
client: 8096 total:  95700, 0: 76523, 1: 17194, 2:  867, 4: 505, 8:  357, 16:  253, 32:  0, 64: 1, 128: 0
client: 8097 total:  55921, 0: 27431, 1: 24725, 2: 1731, 4: 642, 8:  575, 16:  815, 32:  1, 64: 1, 128: 0
client: 8098 total:  86615, 0: 71408, 1: 12726, 2:  872, 4: 497, 8:  392, 16:  719, 32:  1, 64: 0, 128: 0
client: 8099 total:  54340, 0: 25749, 1: 25300, 2: 1258, 4: 546, 8:  471, 16: 1012, 32:  3, 64: 0, 128: 1
client: 8100 total: 101811, 0: 81236, 1: 18803, 2:  992, 4: 517, 8:  230, 16:   32, 32:  0, 64: 1, 128: 0
client: 8101 total:  55106, 0: 28254, 1: 23328, 2: 1534, 4: 607, 8:  477, 16:  886, 32: 19, 64: 1, 128: 0
client: 8102 total:  97187, 0: 78106, 1: 16953, 2:  999, 4: 531, 8:  329, 16:  268, 32:  0, 64: 1, 128: 0
client: 8103 total:  49235, 0: 22971, 1: 22932, 2: 1126, 4: 447, 8:  508, 16: 1250, 32:  0, 64: 1, 128: 0
client: 8104 total:  79003, 0: 64417, 1: 11910, 2:  802, 4: 537, 8:  418, 16:  918, 32:  1, 64: 0, 128: 0
client: 8105 total:  70947, 0: 33860, 1: 33646, 2: 2262, 4: 687, 8:  387, 16:  104, 32:  0, 64: 1, 128: 0
client: 8106 total:  95571, 0: 75702, 1: 17737, 2:  965, 4: 526, 8:  314, 16:  326, 32:  1, 64: 0, 128: 0
client: 8107 total:  53974, 0: 25299, 1: 25117, 2: 1485, 4: 566, 8:  540, 16:  965, 32:  2, 64: 0, 128: 0
client: 8108 total: 100811, 0: 79960, 1: 19072, 2:  952, 4: 532, 8:  259, 16:   35, 32:  0, 64: 1, 128: 0
client: 8109 total:  57953, 0: 29463, 1: 24794, 2: 1693, 4: 694, 8:  626, 16:  671, 32: 11, 64: 1, 128: 0
client: 8110 total:  99779, 0: 79272, 1: 18659, 2:  989, 4: 503, 8:  286, 16:   69, 32:  0, 64: 1, 128: 0
client: 8111 total:  58895, 0: 29628, 1: 25699, 2: 1750, 4: 641, 8:  489, 16:  677, 32: 10, 64: 1, 128: 0
client: 8112 total: 100120, 0: 78217, 1: 20003, 2: 1002, 4: 564, 8:  291, 16:   42, 32:  1, 64: 0, 128: 0
client: 8113 total:  60226, 0: 27861, 1: 28981, 2: 1684, 4: 588, 8:  455, 16:  655, 32:  1, 64: 0, 128: 1
client: 8114 total:  76007, 0: 62207, 1: 10978, 2:  736, 4: 553, 8:  506, 16: 1026, 32:  0, 64: 1, 128: 0
client: 8115 total:  68739, 0: 33453, 1: 31888, 2: 2083, 4: 668, 8:  409, 16:  234, 32:  3, 64: 1, 128: 0
client: 8116 total:  72167, 0: 60625, 1:  8528, 2:  751, 4: 498, 8:  536, 16: 1228, 32:  0, 64: 1, 128: 0
client: 8117 total:  63070, 0: 32596, 1: 27312, 2: 1575, 4: 580, 8:  418, 16:  573, 32: 15, 64: 1, 128: 0
client: 8118 total:  94387, 0: 76269, 1: 15914, 2:  997, 4: 566, 8:  346, 16:  294, 32:  0, 64: 1, 128: 0
client: 8119 total:  60738, 0: 29308, 1: 28053, 2: 1700, 4: 610, 8:  441, 16:  623, 32:  2, 64: 1, 128: 0
client: 8120 total:  71831, 0: 58997, 1:  9899, 2:  588, 4: 506, 8:  645, 16: 1195, 32:  0, 64: 1, 128: 0
client: 8121 total:  61332, 0: 32584, 1: 25465, 2: 1524, 4: 619, 8:  464, 16:  662, 32: 13, 64: 1, 128: 0
client: 8122 total:  90574, 0: 72577, 1: 15600, 2:  958, 4: 540, 8:  411, 16:  487, 32:  0, 64: 1, 128: 0
client: 8123 total:  55340, 0: 26122, 1: 25865, 2: 1345, 4: 557, 8:  499, 16:  950, 32:  1, 64: 1, 128: 0
client: 8124 total:  65043, 0: 53995, 1:  7914, 2:  437, 4: 399, 8:  767, 16: 1530, 32:  0, 64: 1, 128: 0
client: 8125 total:  58368, 0: 33484, 1: 21438, 2: 1504, 4: 555, 8:  516, 16:  859, 32: 11, 64: 1, 128: 0
client: 8126 total:  99912, 0: 80003, 1: 18056, 2:  973, 4: 511, 8:  279, 16:   89, 32:  0, 64: 1, 128: 0
client: 8127 total:  58941, 0: 28988, 1: 26433, 2: 1744, 4: 598, 8:  476, 16:  695, 32:  6, 64: 1, 128: 0

MGLRU on
--------
client: 8000 total: 135911, 0: 131546, 1:  943, 2:  801, 4: 642, 8: 1977, 16:    2, 32:  0, 64: 0, 128: 0
client: 8001 total: 120756, 0: 116009, 1: 1341, 2:  854, 4: 675, 8: 1876, 16:    1, 32:  0, 64: 0, 128: 0
client: 8002 total: 140312, 0: 136194, 1:  930, 2:  766, 4: 701, 8: 1716, 16:    5, 32:  0, 64: 0, 128: 0
client: 8003 total: 120951, 0: 116166, 1: 1432, 2:  895, 4: 648, 8: 1807, 16:    3, 32:  0, 64: 0, 128: 0
client: 8004 total: 134239, 0: 129877, 1:  883, 2:  774, 4: 690, 8: 2011, 16:    4, 32:  0, 64: 0, 128: 0
client: 8005 total: 119151, 0: 114323, 1: 1373, 2:  880, 4: 606, 8: 1966, 16:    3, 32:  0, 64: 0, 128: 0
client: 8006 total: 140413, 0: 136236, 1:  945, 2:  806, 4: 688, 8: 1737, 16:    1, 32:  0, 64: 0, 128: 0
client: 8007 total: 120694, 0: 115888, 1: 1511, 2:  871, 4: 599, 8: 1822, 16:    3, 32:  0, 64: 0, 128: 0
client: 8008 total: 135646, 0: 131378, 1:  879, 2:  742, 4: 669, 8: 1973, 16:    5, 32:  0, 64: 0, 128: 0
client: 8009 total: 118450, 0: 113578, 1: 1303, 2:  952, 4: 610, 8: 2003, 16:    4, 32:  0, 64: 0, 128: 0
client: 8010 total: 134853, 0: 130437, 1:  951, 2:  806, 4: 664, 8: 1989, 16:    6, 32:  0, 64: 0, 128: 0
client: 8011 total: 121233, 0: 116564, 1: 1292, 2:  866, 4: 608, 8: 1902, 16:    1, 32:  0, 64: 0, 128: 0
client: 8012 total: 130700, 0: 126167, 1:  899, 2:  771, 4: 745, 8: 2112, 16:    6, 32:  0, 64: 0, 128: 0
client: 8013 total: 122335, 0: 117575, 1: 1407, 2:  931, 4: 603, 8: 1817, 16:    2, 32:  0, 64: 0, 128: 0
client: 8014 total: 132725, 0: 128209, 1: 1012, 2:  777, 4: 710, 8: 2010, 16:    7, 32:  0, 64: 0, 128: 0
client: 8015 total: 126804, 0: 122228, 1: 1484, 2:  964, 4: 602, 8: 1525, 16:    1, 32:  0, 64: 0, 128: 0
client: 8016 total: 137896, 0: 133606, 1:  944, 2:  774, 4: 728, 8: 1840, 16:    4, 32:  0, 64: 0, 128: 0
client: 8017 total: 119403, 0: 114641, 1: 1330, 2:  929, 4: 578, 8: 1924, 16:    1, 32:  0, 64: 0, 128: 0
client: 8018 total: 134848, 0: 130415, 1:  972, 2:  742, 4: 710, 8: 2007, 16:    2, 32:  0, 64: 0, 128: 0
client: 8019 total: 119305, 0: 114515, 1: 1370, 2:  895, 4: 577, 8: 1946, 16:    2, 32:  0, 64: 0, 128: 0
client: 8020 total: 140862, 0: 136704, 1:  932, 2:  805, 4: 695, 8: 1723, 16:    3, 32:  0, 64: 0, 128: 0
client: 8021 total: 120837, 0: 116074, 1: 1410, 2:  901, 4: 617, 8: 1831, 16:    4, 32:  0, 64: 0, 128: 0
client: 8022 total: 132710, 0: 128278, 1:  902, 2:  783, 4: 664, 8: 2079, 16:    4, 32:  0, 64: 0, 128: 0
client: 8023 total: 126671, 0: 122176, 1: 1420, 2:  916, 4: 576, 8: 1578, 16:    5, 32:  0, 64: 0, 128: 0
client: 8024 total: 135185, 0: 130845, 1:  878, 2:  815, 4: 669, 8: 1977, 16:    1, 32:  0, 64: 0, 128: 0
client: 8025 total: 122042, 0: 117414, 1: 1321, 2:  911, 4: 579, 8: 1814, 16:    3, 32:  0, 64: 0, 128: 0
client: 8026 total: 135082, 0: 130656, 1:  920, 2:  778, 4: 752, 8: 1974, 16:    2, 32:  0, 64: 0, 128: 0
client: 8027 total: 123020, 0: 118290, 1: 1440, 2:  944, 4: 583, 8: 1757, 16:    6, 32:  0, 64: 0, 128: 0
client: 8028 total: 135350, 0: 130949, 1:  880, 2:  846, 4: 705, 8: 1967, 16:    3, 32:  0, 64: 0, 128: 0
client: 8029 total: 119721, 0: 114996, 1: 1299, 2:  920, 4: 621, 8: 1882, 16:    3, 32:  0, 64: 0, 128: 0
client: 8030 total: 131991, 0: 127411, 1:  934, 2:  826, 4: 673, 8: 2144, 16:    3, 32:  0, 64: 0, 128: 0
client: 8031 total: 123002, 0: 118304, 1: 1385, 2:  939, 4: 652, 8: 1722, 16:    0, 32:  0, 64: 0, 128: 0
client: 8032 total: 130133, 0: 125547, 1:  952, 2:  758, 4: 656, 8: 2213, 16:    7, 32:  0, 64: 0, 128: 0
client: 8033 total: 125114, 0: 120632, 1: 1307, 2:  867, 4: 666, 8: 1640, 16:    2, 32:  0, 64: 0, 128: 0
client: 8034 total: 137708, 0: 133396, 1:  977, 2:  776, 4: 699, 8: 1853, 16:    7, 32:  0, 64: 0, 128: 0
client: 8035 total: 122820, 0: 118162, 1: 1377, 2:  899, 4: 629, 8: 1753, 16:    0, 32:  0, 64: 0, 128: 0
client: 8036 total: 132909, 0: 128405, 1:  935, 2:  791, 4: 664, 8: 2106, 16:    8, 32:  0, 64: 0, 128: 0
client: 8037 total: 121966, 0: 117335, 1: 1295, 2:  927, 4: 589, 8: 1816, 16:    4, 32:  0, 64: 0, 128: 0
client: 8038 total: 137967, 0: 133673, 1:  961, 2:  802, 4: 680, 8: 1847, 16:    4, 32:  0, 64: 0, 128: 0
client: 8039 total: 122542, 0: 117875, 1: 1410, 2:  890, 4: 598, 8: 1764, 16:    5, 32:  0, 64: 0, 128: 0
client: 8040 total: 137028, 0: 132749, 1:  953, 2:  774, 4: 649, 8: 1901, 16:    2, 32:  0, 64: 0, 128: 0
client: 8041 total: 122337, 0: 117649, 1: 1334, 2:  942, 4: 621, 8: 1787, 16:    4, 32:  0, 64: 0, 128: 0
client: 8042 total: 133651, 0: 129143, 1: 1012, 2:  786, 4: 680, 8: 2024, 16:    6, 32:  0, 64: 0, 128: 0
client: 8043 total: 124615, 0: 120053, 1: 1348, 2:  948, 4: 595, 8: 1667, 16:    4, 32:  0, 64: 0, 128: 0
client: 8044 total: 135503, 0: 131208, 1:  907, 2:  748, 4: 651, 8: 1982, 16:    7, 32:  0, 64: 0, 128: 0
client: 8045 total: 120529, 0: 115721, 1: 1400, 2:  943, 4: 601, 8: 1860, 16:    4, 32:  0, 64: 0, 128: 0
client: 8046 total: 131720, 0: 127244, 1:  865, 2:  768, 4: 754, 8: 2086, 16:    3, 32:  0, 64: 0, 128: 0
client: 8047 total: 124835, 0: 120237, 1: 1351, 2:  912, 4: 667, 8: 1666, 16:    2, 32:  0, 64: 0, 128: 0
client: 8048 total: 135540, 0: 131116, 1:  991, 2:  772, 4: 737, 8: 1917, 16:    7, 32:  0, 64: 0, 128: 0
client: 8049 total: 118887, 0: 114049, 1: 1357, 2:  900, 4: 649, 8: 1929, 16:    3, 32:  0, 64: 0, 128: 0
client: 8050 total: 131503, 0: 126909, 1:  951, 2:  809, 4: 706, 8: 2126, 16:    2, 32:  0, 64: 0, 128: 0
client: 8051 total: 119694, 0: 114937, 1: 1317, 2:  862, 4: 657, 8: 1920, 16:    1, 32:  0, 64: 0, 128: 0
client: 8052 total: 133973, 0: 129595, 1:  845, 2:  723, 4: 751, 8: 2059, 16:    0, 32:  0, 64: 0, 128: 0
client: 8053 total: 121011, 0: 116242, 1: 1372, 2:  930, 4: 610, 8: 1857, 16:    0, 32:  0, 64: 0, 128: 0
client: 8054 total: 139674, 0: 135515, 1:  969, 2:  774, 4: 663, 8: 1750, 16:    3, 32:  0, 64: 0, 128: 0
client: 8055 total: 121594, 0: 116707, 1: 1508, 2:  944, 4: 627, 8: 1808, 16:    0, 32:  0, 64: 0, 128: 0
client: 8056 total: 139949, 0: 135796, 1:  935, 2:  766, 4: 719, 8: 1727, 16:    6, 32:  0, 64: 0, 128: 0
client: 8057 total: 118338, 0: 113433, 1: 1398, 2:  900, 4: 664, 8: 1938, 16:    5, 32:  0, 64: 0, 128: 0
client: 8058 total: 133214, 0: 128824, 1:  833, 2:  828, 4: 667, 8: 2060, 16:    2, 32:  0, 64: 0, 128: 0
client: 8059 total: 126622, 0: 122081, 1: 1401, 2:  964, 4: 598, 8: 1576, 16:    2, 32:  0, 64: 0, 128: 0
client: 8060 total: 135070, 0: 130693, 1:  973, 2:  746, 4: 707, 8: 1949, 16:    2, 32:  0, 64: 0, 128: 0
client: 8061 total: 117996, 0: 113126, 1: 1263, 2:  922, 4: 659, 8: 2024, 16:    2, 32:  0, 64: 0, 128: 0
client: 8062 total: 141314, 0: 137223, 1:  921, 2:  752, 4: 732, 8: 1684, 16:    2, 32:  0, 64: 0, 128: 0
client: 8063 total: 115228, 0: 110205, 1: 1352, 2:  927, 4: 623, 8: 2118, 16:    3, 32:  0, 64: 0, 128: 0
client: 8064 total: 142955, 0: 138947, 1:  956, 2:  769, 4: 691, 8: 1590, 16:    2, 32:  0, 64: 0, 128: 0
client: 8065 total: 122755, 0: 118086, 1: 1407, 2:  908, 4: 651, 8: 1701, 16:    2, 32:  0, 64: 0, 128: 0
client: 8066 total: 136624, 0: 132286, 1:  957, 2:  767, 4: 704, 8: 1906, 16:    4, 32:  0, 64: 0, 128: 0
client: 8067 total: 120349, 0: 115590, 1: 1421, 2:  894, 4: 542, 8: 1902, 16:    0, 32:  0, 64: 0, 128: 0
client: 8068 total: 131028, 0: 126382, 1:  921, 2:  853, 4: 677, 8: 2189, 16:    6, 32:  0, 64: 0, 128: 0
client: 8069 total: 117941, 0: 113116, 1: 1297, 2:  870, 4: 607, 8: 2050, 16:    1, 32:  0, 64: 0, 128: 0
client: 8070 total: 137469, 0: 133203, 1:  864, 2:  786, 4: 692, 8: 1921, 16:    3, 32:  0, 64: 0, 128: 0
client: 8071 total: 116240, 0: 111288, 1: 1371, 2:  872, 4: 612, 8: 2094, 16:    3, 32:  0, 64: 0, 128: 0
client: 8072 total: 137278, 0: 132999, 1:  926, 2:  782, 4: 762, 8: 1807, 16:    2, 32:  0, 64: 0, 128: 0
client: 8073 total: 121421, 0: 116615, 1: 1469, 2:  942, 4: 566, 8: 1827, 16:    2, 32:  0, 64: 0, 128: 0
client: 8074 total: 130281, 0: 125717, 1:  906, 2:  789, 4: 632, 8: 2228, 16:    9, 32:  0, 64: 0, 128: 0
client: 8075 total: 120662, 0: 115944, 1: 1307, 2:  919, 4: 573, 8: 1918, 16:    1, 32:  0, 64: 0, 128: 0
client: 8076 total: 139454, 0: 135179, 1:  969, 2:  842, 4: 692, 8: 1769, 16:    3, 32:  0, 64: 0, 128: 0
client: 8077 total: 120573, 0: 115764, 1: 1440, 2:  917, 4: 615, 8: 1835, 16:    2, 32:  0, 64: 0, 128: 0
client: 8078 total: 137138, 0: 132847, 1:  912, 2:  770, 4: 703, 8: 1903, 16:    3, 32:  0, 64: 0, 128: 0
client: 8079 total: 116102, 0: 111135, 1: 1343, 2:  908, 4: 607, 8: 2108, 16:    1, 32:  0, 64: 0, 128: 0
client: 8080 total: 137181, 0: 132934, 1:  880, 2:  758, 4: 711, 8: 1896, 16:    2, 32:  0, 64: 0, 128: 0
client: 8081 total: 120698, 0: 115918, 1: 1451, 2:  880, 4: 589, 8: 1856, 16:    4, 32:  0, 64: 0, 128: 0
client: 8082 total: 140708, 0: 136601, 1:  963, 2:  743, 4: 679, 8: 1718, 16:    4, 32:  0, 64: 0, 128: 0
client: 8083 total: 120397, 0: 115540, 1: 1465, 2:  968, 4: 554, 8: 1867, 16:    3, 32:  0, 64: 0, 128: 0
client: 8084 total: 136258, 0: 131848, 1:  946, 2:  853, 4: 710, 8: 1899, 16:    2, 32:  0, 64: 0, 128: 0
client: 8085 total: 122964, 0: 118257, 1: 1470, 2:  906, 4: 598, 8: 1729, 16:    4, 32:  0, 64: 0, 128: 0
client: 8086 total: 139809, 0: 135609, 1:  967, 2:  769, 4: 670, 8: 1792, 16:    2, 32:  0, 64: 0, 128: 0
client: 8087 total: 112384, 0: 107326, 1: 1285, 2:  855, 4: 659, 8: 2254, 16:    5, 32:  0, 64: 0, 128: 0
client: 8088 total: 139801, 0: 135647, 1:  904, 2:  770, 4: 707, 8: 1769, 16:    4, 32:  0, 64: 0, 128: 0
client: 8089 total: 116987, 0: 112011, 1: 1402, 2:  882, 4: 657, 8: 2032, 16:    3, 32:  0, 64: 0, 128: 0
client: 8090 total: 136223, 0: 131911, 1:  950, 2:  771, 4: 649, 8: 1938, 16:    4, 32:  0, 64: 0, 128: 0
client: 8091 total: 119708, 0: 114922, 1: 1342, 2:  887, 4: 629, 8: 1924, 16:    4, 32:  0, 64: 0, 128: 0
client: 8092 total: 135010, 0: 130541, 1: 1016, 2:  773, 4: 673, 8: 2005, 16:    2, 32:  0, 64: 0, 128: 0
client: 8093 total: 123205, 0: 118580, 1: 1321, 2:  894, 4: 734, 8: 1670, 16:    6, 32:  0, 64: 0, 128: 0
client: 8094 total: 136616, 0: 132297, 1:  935, 2:  768, 4: 640, 8: 1968, 16:    8, 32:  0, 64: 0, 128: 0
client: 8095 total: 117438, 0: 112540, 1: 1405, 2:  856, 4: 600, 8: 2034, 16:    3, 32:  0, 64: 0, 128: 0
client: 8096 total: 134197, 0: 129768, 1:  918, 2:  797, 4: 700, 8: 2013, 16:    1, 32:  0, 64: 0, 128: 0
client: 8097 total: 120484, 0: 115747, 1: 1342, 2:  895, 4: 588, 8: 1908, 16:    4, 32:  0, 64: 0, 128: 0
client: 8098 total: 140243, 0: 136068, 1:  913, 2:  808, 4: 722, 8: 1726, 16:    6, 32:  0, 64: 0, 128: 0
client: 8099 total: 120246, 0: 115400, 1: 1421, 2:  932, 4: 623, 8: 1870, 16:    0, 32:  0, 64: 0, 128: 0
client: 8100 total: 137723, 0: 133391, 1: 1009, 2:  808, 4: 650, 8: 1863, 16:    2, 32:  0, 64: 0, 128: 0
client: 8101 total: 120033, 0: 115244, 1: 1390, 2:  879, 4: 660, 8: 1857, 16:    3, 32:  0, 64: 0, 128: 0
client: 8102 total: 133661, 0: 129007, 1: 1059, 2:  851, 4: 691, 8: 2048, 16:    5, 32:  0, 64: 0, 128: 0
client: 8103 total: 121359, 0: 116670, 1: 1393, 2:  850, 4: 592, 8: 1848, 16:    6, 32:  0, 64: 0, 128: 0
client: 8104 total: 140702, 0: 136607, 1:  918, 2:  758, 4: 675, 8: 1735, 16:    9, 32:  0, 64: 0, 128: 0
client: 8105 total: 117126, 0: 112234, 1: 1310, 2:  935, 4: 658, 8: 1989, 16:    0, 32:  0, 64: 0, 128: 0
client: 8106 total: 137042, 0: 132779, 1:  882, 2:  816, 4: 707, 8: 1856, 16:    2, 32:  0, 64: 0, 128: 0
client: 8107 total: 121781, 0: 116961, 1: 1487, 2:  920, 4: 614, 8: 1795, 16:    4, 32:  0, 64: 0, 128: 0
client: 8108 total: 131094, 0: 126590, 1:  861, 2:  800, 4: 657, 8: 2182, 16:    4, 32:  0, 64: 0, 128: 0
client: 8109 total: 121888, 0: 117200, 1: 1333, 2:  952, 4: 620, 8: 1781, 16:    2, 32:  0, 64: 0, 128: 0
client: 8110 total: 136023, 0: 131748, 1:  895, 2:  751, 4: 739, 8: 1885, 16:    5, 32:  0, 64: 0, 128: 0
client: 8111 total: 112796, 0: 107705, 1: 1278, 2:  907, 4: 634, 8: 2269, 16:    3, 32:  0, 64: 0, 128: 0
client: 8112 total: 132701, 0: 128249, 1:  840, 2:  806, 4: 696, 8: 2106, 16:    4, 32:  0, 64: 0, 128: 0
client: 8113 total: 121891, 0: 117107, 1: 1438, 2:  915, 4: 621, 8: 1805, 16:    5, 32:  0, 64: 0, 128: 0
client: 8114 total: 139379, 0: 135237, 1:  879, 2:  777, 4: 655, 8: 1828, 16:    3, 32:  0, 64: 0, 128: 0
client: 8115 total: 114118, 0: 108925, 1: 1390, 2:  939, 4: 709, 8: 2152, 16:    3, 32:  0, 64: 0, 128: 0
client: 8116 total: 129121, 0: 124433, 1:  992, 2:  803, 4: 620, 8: 2269, 16:    4, 32:  0, 64: 0, 128: 0
client: 8117 total: 123295, 0: 118644, 1: 1339, 2:  927, 4: 650, 8: 1731, 16:    4, 32:  0, 64: 0, 128: 0
client: 8118 total: 139996, 0: 135890, 1:  880, 2:  797, 4: 652, 8: 1773, 16:    4, 32:  0, 64: 0, 128: 0
client: 8119 total: 118785, 0: 114003, 1: 1371, 2:  866, 4: 577, 8: 1968, 16:    0, 32:  0, 64: 0, 128: 0
client: 8120 total: 139403, 0: 135284, 1:  848, 2:  798, 4: 700, 8: 1769, 16:    4, 32:  0, 64: 0, 128: 0
client: 8121 total: 118842, 0: 114098, 1: 1286, 2:  849, 4: 631, 8: 1974, 16:    4, 32:  0, 64: 0, 128: 0
client: 8122 total: 139147, 0: 134951, 1:  931, 2:  804, 4: 669, 8: 1792, 16:    0, 32:  0, 64: 0, 128: 0
client: 8123 total: 117281, 0: 112432, 1: 1304, 2:  918, 4: 643, 8: 1981, 16:    3, 32:  0, 64: 0, 128: 0
client: 8124 total: 132181, 0: 127728, 1:  847, 2:  805, 4: 683, 8: 2112, 16:    6, 32:  0, 64: 0, 128: 0
client: 8125 total: 123172, 0: 118464, 1: 1397, 2:  928, 4: 645, 8: 1737, 16:    1, 32:  0, 64: 0, 128: 0
client: 8126 total: 134014, 0: 129499, 1:  939, 2:  834, 4: 692, 8: 2049, 16:    1, 32:  0, 64: 0, 128: 0
client: 8127 total: 120439, 0: 115674, 1: 1342, 2:  869, 4: 666, 8: 1887, 16:    1, 32:  0, 64: 0, 128: 0

References
==========
[1] https://madnight.github.io/githut/#/pull_requests/2022/1
[2] https://insights.stackoverflow.com/survey/2021#technology
[3] https://cloud.google.com/blog/products/compute/tau-t2a-is-first-compute-engine-vm-on-an-arm-chip
[4] https://www.selenium.dev/blog/2021/selenium-survey-results/


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Java / POWER9 benchmark with MGLRU
  2022-12-01 22:39 [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU Yu Zhao
                   ` (8 preceding siblings ...)
  2022-12-20 21:49 ` JavaScript / Ampere Altra benchmark with MGLRU Yu Zhao
@ 2022-12-21  0:07 ` Yu Zhao
  9 siblings, 0 replies; 14+ messages in thread
From: Yu Zhao @ 2022-12-21  0:07 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Aneesh Kumar, Jonathan Corbet, Linus Torvalds,
	Michael Larabel, Vaibhav Jain, linux-kernel, linux-mm

[This is a resend. The original message was lost:
https://lore.kernel.org/r/20221219193613.998597-1-yuzhao@google.com/]

TLDR
====
SPECjbb2015 groups [1]    Critical jOPS (95% CI)    Max jOPS (95% CI)
---------------------------------------------------------------------
20                        NS                        NS
30                        +[4, 7]%                  NS
40                        OOM killed                OOM killed

Abbreviations
=============
CI:   confidence interval
NS:   no statistically significant difference
DUT:  device under test
ATE:  automatic test equipment

Rational
========
1. Java has been mostly the most popular programming language for the
   last two decades, according to the TIOBE Programming Community
   Index [2].
2. Power ISA is the longest-lasting alternative to x86 for the server
   segment [3].
3. SPECjbb2015 is the industry-standard benchmark for Java.

Hardware
========
DUT $ lscpu
Architecture:          ppc64le
  Byte Order:          Little Endian
CPU(s):                184
  On-line CPU(s) list: 0-183
Model name:            POWER9 (raw), altivec supported
  Model:               2.2 (pvr 004e 1202)
  Thread(s) per core:  4
  Core(s) per socket:  23
  Socket(s):           2
  CPU max MHz:         3000.0000
  CPU min MHz:         2300.0000
Caches (sum of all):
  L1d:                 1.4 MiB (46 instances)
  L1i:                 1.4 MiB (46 instances)
  L2:                  12 MiB (24 instances)
  L3:                  240 MiB (24 instances)
NUMA:
  NUMA node(s):        2
  NUMA node0 CPU(s):   0-91
  NUMA node1 CPU(s):   92-183
Vulnerabilities:
  Itlb multihit:       Not affected
  L1tf:                Mitigation; RFI Flush, L1D private per thread
  Mds:                 Not affected
  Meltdown:            Mitigation; RFI Flush, L1D private per thread
  Mmio stale data:     Not affected
  Retbleed:            Not affected
  Spec store bypass:   Mitigation; Kernel entry/exit barrier (eieio)
  Spectre v1:          Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
  Spectre v2:          Mitigation; Indirect branch serialisation (kernel only), Indirect branch cache disabled, Software link stack flush
  Srbds:               Not affected
  Tsx async abort:     Not affected

DUT $ numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
node 0 size: 261659 MB
node 0 free: 259051 MB
node 1 cpus: 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183
node 1 size: 261713 MB
node 1 free: 257499 MB
node distances:
node   0   1
  0:  10  40
  1:  40  10

DUT $ cat /sys/class/nvme/nvme0/model
INTEL SSDPF21Q800GB

DUT $ cat /sys/class/nvme/nvme0/numa_node
0

DUT $ cat /sys/class/nvme/nvme1/model
INTEL SSDPF21Q800GB

DUT $ cat /sys/class/nvme/nvme1/numa_node
1

Software
========
DUT $ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04 LTS"

DUT $ uname -a
Linux ppc 6.1.0-rc8-mglru #1 SMP Tue Dec  6 06:18:48 UTC 2022 ppc64le ppc64le ppc64le GNU/Linux

DUT $ cat /proc/swaps
Filename        Type         Size         Used  Priority
/dev/nvme0n1    partition    268435392    0     -2
/dev/nvme1n1    partition    268435392    0     -3

DUT $ java --version
openjdk 11.0.16 2022-07-19
OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.16+8-post-Ubuntu-0ubuntu122.04, mixed mode)

DUT $ cat specjbb2015/version.txt
SPECjbb2015 1.03 (11/14/2019)

Procedure
=========
DUT $ cat run_specjbb2015.sh
echo 0 >/proc/sys/kernel/numa_balancing

nodes=2
memcgs=$1

run() {
    memcg=$1
    path=/sys/fs/cgroup/memcg$memcg

    mkdir $path
    echo $BASHPID >$path/cgroup.procs

    for ((node = 0; node < $nodes; node++)); do
        group=$((nodes * memcg + node))

        numactl -N $node -m $node java -jar specjbb2015.jar \
            -m backend -G GRP$group -J JVM0 &
        numactl -N $node -m $node java -jar specjbb2015.jar \
            -m txinjector -G GRP$group -J JVM1 &
    done

    wait
}

numactl -N 0 -m 0 java -Dspecjbb.group.count=$((nodes * memcgs)) \
        -Dspecjbb.controller.rtcurve.warmup.step=0.8 \
        -jar specjbb2015.jar -m multicontroller &

for ((memcg = 0; memcg < $memcgs; memcg++)); do
    run $memcg &
done

wait

Results
=======
Critical jOPS (30 groups)
-------------------------
$ R
> a <- c(33786, 34903, 34254, 34608, 33149, 34530, 33867, 33691, 33284, 34490)
> b <- c(35192, 36691, 35771, 36399, 36321, 35177, 35792, 36145, 36594, 36207)
> t.test(a, b)

        Welch Two Sample t-test

data:  a and b
t = -7.8327, df = 17.828, p-value = 3.529e-07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2502.195 -1443.205
sample estimates:
mean of x mean of y
  34056.2   36028.9

Max jOPS (30 groups)
--------------------
$ R
> a <- c(61310, 60640, 60515, 59820, 60239, 60140, 60074, 60761, 59099, 59843)
> b <- c(60338, 60515, 60338, 58305, 59660, 62372, 59820, 61499, 60338, 60338)
> t.test(a, b)

        Welch Two Sample t-test

data:  a and b
t = -0.27732, df = 14.231, p-value = 0.7855
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -943.7491  727.3491
sample estimates:
mean of x mean of y
  60244.1   60352.3

References
==========
[1] https://www.spec.org/jbb2015/docs/userguide.pdf
[2] https://www.tiobe.com/tiobe-index/
[3] https://cloud.google.com/blog/products/gcp/introducing-zaius-google-and-rackspaces-open-server-running-ibm-power9

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-12-21  0:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-01 22:39 [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU Yu Zhao
2022-12-01 22:39 ` [PATCH mm-unstable v1 1/8] mm: multi-gen LRU: rename lru_gen_struct to lru_gen_folio Yu Zhao
2022-12-01 22:39 ` [PATCH mm-unstable v1 2/8] mm: multi-gen LRU: rename lrugen->lists[] to lrugen->folios[] Yu Zhao
2022-12-01 22:39 ` [PATCH mm-unstable v1 3/8] mm: multi-gen LRU: remove eviction fairness safeguard Yu Zhao
2022-12-11  3:59   ` Chen Wandun
2022-12-01 22:39 ` [PATCH mm-unstable v1 4/8] mm: multi-gen LRU: remove aging " Yu Zhao
2022-12-01 22:39 ` [PATCH mm-unstable v1 5/8] mm: multi-gen LRU: shuffle should_run_aging() Yu Zhao
2022-12-01 22:39 ` [PATCH mm-unstable v1 6/8] mm: multi-gen LRU: per-node lru_gen_folio lists Yu Zhao
2022-12-03  4:20   ` Hillf Danton
2022-12-01 22:39 ` [PATCH mm-unstable v1 7/8] mm: multi-gen LRU: clarify scan_control flags Yu Zhao
2022-12-02  4:17   ` Hillf Danton
2022-12-01 22:39 ` [PATCH mm-unstable v1 8/8] mm: multi-gen LRU: simplify arch_has_hw_pte_young() check Yu Zhao
2022-12-20 21:49 ` JavaScript / Ampere Altra benchmark with MGLRU Yu Zhao
2022-12-21  0:07 ` Java / POWER9 " Yu Zhao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.