* [PATCH mm-unstable v1 0/7] mm: multi-gen LRU: improve
@ 2023-01-18 0:18 T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 1/7] mm: multi-gen LRU: section for working set protection T.J. Alumbaugh
` (7 more replies)
0 siblings, 8 replies; 10+ messages in thread
From: T.J. Alumbaugh @ 2023-01-18 0:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, linux-mm, T.J. Alumbaugh
This patch series improves a few MGLRU functions, collects related
functions, and adds additional documentation.
T.J. Alumbaugh (7):
mm: multi-gen LRU: section for working set protection
mm: multi-gen LRU: section for rmap/PT walk feedback
mm: multi-gen LRU: section for Bloom filters
mm: multi-gen LRU: section for memcg LRU
mm: multi-gen LRU: improve lru_gen_exit_memcg()
mm: multi-gen LRU: improve walk_pmd_range()
mm: multi-gen LRU: simplify lru_gen_look_around()
Documentation/mm/multigen_lru.rst | 78 ++++-
include/linux/mm_inline.h | 17 -
include/linux/mmzone.h | 13 +-
mm/memcontrol.c | 8 +-
mm/vmscan.c | 534 ++++++++++++++++--------------
5 files changed, 360 insertions(+), 290 deletions(-)
--
2.39.0.314.g84b9a713c41-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH mm-unstable v1 1/7] mm: multi-gen LRU: section for working set protection
2023-01-18 0:18 [PATCH mm-unstable v1 0/7] mm: multi-gen LRU: improve T.J. Alumbaugh
@ 2023-01-18 0:18 ` T.J. Alumbaugh
2023-01-18 1:00 ` Matthew Wilcox
2023-01-18 0:18 ` [PATCH mm-unstable v1 2/7] mm: multi-gen LRU: section for rmap/PT walk feedback T.J. Alumbaugh
` (6 subsequent siblings)
7 siblings, 1 reply; 10+ messages in thread
From: T.J. Alumbaugh @ 2023-01-18 0:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, linux-mm, T.J. Alumbaugh
Add a section for working set protection in the code and the design
doc. The admin doc already contains its usage.
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
---
Documentation/mm/multigen_lru.rst | 15 +++++++++++++++
mm/vmscan.c | 4 ++++
2 files changed, 19 insertions(+)
diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_lru.rst
index d8f721f98868..6e1483e70fdc 100644
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -141,6 +141,21 @@ loop has detected outlying refaults from the tier this page is in. To
this end, the feedback loop uses the first tier as the baseline, for
the reason stated earlier.
+Working set protection
+----------------------
+Each generation is timestamped at birth. If ``lru_gen_min_ttl`` is
+set, an ``lruvec`` is protected from the eviction when its oldest
+generation was born within ``lru_gen_min_ttl`` milliseconds. In other
+words, it prevents the working set of ``lru_gen_min_ttl`` milliseconds
+from getting evicted. The OOM killer is triggered if this working set
+cannot be kept in memory.
+
+This time-based approach has the following advantages:
+
+1. It is easier to configure because it is agnostic to applications
+ and memory sizes.
+2. It is more reliable because it is directly wired to the OOM killer.
+
Summary
-------
The multi-gen LRU can be disassembled into the following parts:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 394ff4962cbc..a741765896b6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4475,6 +4475,10 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
return true;
}
+/******************************************************************************
+ * working set protection
+ ******************************************************************************/
+
static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
{
int gen, type, zone;
--
2.39.0.314.g84b9a713c41-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH mm-unstable v1 2/7] mm: multi-gen LRU: section for rmap/PT walk feedback
2023-01-18 0:18 [PATCH mm-unstable v1 0/7] mm: multi-gen LRU: improve T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 1/7] mm: multi-gen LRU: section for working set protection T.J. Alumbaugh
@ 2023-01-18 0:18 ` T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 3/7] mm: multi-gen LRU: section for Bloom filters T.J. Alumbaugh
` (5 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: T.J. Alumbaugh @ 2023-01-18 0:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, linux-mm, T.J. Alumbaugh
Add a section for lru_gen_look_around() in the code and the design
doc.
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
---
Documentation/mm/multigen_lru.rst | 14 ++++++++++++++
mm/vmscan.c | 4 ++++
2 files changed, 18 insertions(+)
diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_lru.rst
index 6e1483e70fdc..bd988a142bc2 100644
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -156,6 +156,20 @@ This time-based approach has the following advantages:
and memory sizes.
2. It is more reliable because it is directly wired to the OOM killer.
+Rmap/PT walk feedback
+---------------------
+Searching the rmap for PTEs mapping each page on an LRU list (to test
+and clear the accessed bit) can be expensive because pages from
+different VMAs (PA space) are not cache friendly to the rmap (VA
+space). For workloads mostly using mapped pages, searching the rmap
+can incur the highest CPU cost in the reclaim path.
+
+``lru_gen_look_around()`` exploits spatial locality to reduce the
+trips into the rmap. It scans the adjacent PTEs of a young PTE and
+promotes hot pages. If the scan was done cacheline efficiently, it
+adds the PMD entry pointing to the PTE table to the Bloom filter. This
+forms a feedback loop between the eviction and the aging.
+
Summary
-------
The multi-gen LRU can be disassembled into the following parts:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a741765896b6..eb9263bf6806 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4569,6 +4569,10 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc)
}
}
+/******************************************************************************
+ * rmap/PT walk feedback
+ ******************************************************************************/
+
/*
* This function exploits spatial locality when shrink_folio_list() walks the
* rmap. It scans the adjacent PTEs of a young PTE and promotes hot pages. If
--
2.39.0.314.g84b9a713c41-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH mm-unstable v1 3/7] mm: multi-gen LRU: section for Bloom filters
2023-01-18 0:18 [PATCH mm-unstable v1 0/7] mm: multi-gen LRU: improve T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 1/7] mm: multi-gen LRU: section for working set protection T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 2/7] mm: multi-gen LRU: section for rmap/PT walk feedback T.J. Alumbaugh
@ 2023-01-18 0:18 ` T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 4/7] mm: multi-gen LRU: section for memcg LRU T.J. Alumbaugh
` (4 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: T.J. Alumbaugh @ 2023-01-18 0:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, linux-mm, T.J. Alumbaugh
Move Bloom filters code into a dedicated section. Improve the design
doc to explain Bloom filter usage and connection between aging and
eviction in their use.
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
---
Documentation/mm/multigen_lru.rst | 16 +++
mm/vmscan.c | 180 +++++++++++++++---------------
2 files changed, 108 insertions(+), 88 deletions(-)
diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_lru.rst
index bd988a142bc2..770b5d539856 100644
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -170,6 +170,22 @@ promotes hot pages. If the scan was done cacheline efficiently, it
adds the PMD entry pointing to the PTE table to the Bloom filter. This
forms a feedback loop between the eviction and the aging.
+Bloom Filters
+-------------
+Bloom filters are a space and memory efficient data structure for set
+membership test, i.e., test if an element is not in the set or may be
+in the set.
+
+In the eviction path, specifically, in ``lru_gen_look_around()``, if a
+PMD has a sufficient number of hot pages, its address is placed in the
+filter. In the aging path, set membership means that the PTE range
+will be scanned for young pages.
+
+Note that Bloom filters are probabilistic on set membership. If a test
+is false positive, the cost is an additional scan of a range of PTEs,
+which may yield hot pages anyway. Parameters of the filter itself can
+control the false positive rate in the limit.
+
Summary
-------
The multi-gen LRU can be disassembled into the following parts:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index eb9263bf6806..1be9120349f8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3233,6 +3233,98 @@ static bool __maybe_unused seq_is_valid(struct lruvec *lruvec)
get_nr_gens(lruvec, LRU_GEN_ANON) <= MAX_NR_GENS;
}
+/******************************************************************************
+ * Bloom filters
+ ******************************************************************************/
+
+/*
+ * Bloom filters with m=1<<15, k=2 and the false positive rates of ~1/5 when
+ * n=10,000 and ~1/2 when n=20,000, where, conventionally, m is the number of
+ * bits in a bitmap, k is the number of hash functions and n is the number of
+ * inserted items.
+ *
+ * Page table walkers use one of the two filters to reduce their search space.
+ * To get rid of non-leaf entries that no longer have enough leaf entries, the
+ * aging uses the double-buffering technique to flip to the other filter each
+ * time it produces a new generation. For non-leaf entries that have enough
+ * leaf entries, the aging carries them over to the next generation in
+ * walk_pmd_range(); the eviction also report them when walking the rmap
+ * in lru_gen_look_around().
+ *
+ * For future optimizations:
+ * 1. It's not necessary to keep both filters all the time. The spare one can be
+ * freed after the RCU grace period and reallocated if needed again.
+ * 2. And when reallocating, it's worth scaling its size according to the number
+ * of inserted entries in the other filter, to reduce the memory overhead on
+ * small systems and false positives on large systems.
+ * 3. Jenkins' hash function is an alternative to Knuth's.
+ */
+#define BLOOM_FILTER_SHIFT 15
+
+static inline int filter_gen_from_seq(unsigned long seq)
+{
+ return seq % NR_BLOOM_FILTERS;
+}
+
+static void get_item_key(void *item, int *key)
+{
+ u32 hash = hash_ptr(item, BLOOM_FILTER_SHIFT * 2);
+
+ BUILD_BUG_ON(BLOOM_FILTER_SHIFT * 2 > BITS_PER_TYPE(u32));
+
+ key[0] = hash & (BIT(BLOOM_FILTER_SHIFT) - 1);
+ key[1] = hash >> BLOOM_FILTER_SHIFT;
+}
+
+static bool test_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item)
+{
+ int key[2];
+ unsigned long *filter;
+ int gen = filter_gen_from_seq(seq);
+
+ filter = READ_ONCE(lruvec->mm_state.filters[gen]);
+ if (!filter)
+ return true;
+
+ get_item_key(item, key);
+
+ return test_bit(key[0], filter) && test_bit(key[1], filter);
+}
+
+static void update_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item)
+{
+ int key[2];
+ unsigned long *filter;
+ int gen = filter_gen_from_seq(seq);
+
+ filter = READ_ONCE(lruvec->mm_state.filters[gen]);
+ if (!filter)
+ return;
+
+ get_item_key(item, key);
+
+ if (!test_bit(key[0], filter))
+ set_bit(key[0], filter);
+ if (!test_bit(key[1], filter))
+ set_bit(key[1], filter);
+}
+
+static void reset_bloom_filter(struct lruvec *lruvec, unsigned long seq)
+{
+ unsigned long *filter;
+ int gen = filter_gen_from_seq(seq);
+
+ filter = lruvec->mm_state.filters[gen];
+ if (filter) {
+ bitmap_clear(filter, 0, BIT(BLOOM_FILTER_SHIFT));
+ return;
+ }
+
+ filter = bitmap_zalloc(BIT(BLOOM_FILTER_SHIFT),
+ __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
+ WRITE_ONCE(lruvec->mm_state.filters[gen], filter);
+}
+
/******************************************************************************
* mm_struct list
******************************************************************************/
@@ -3352,94 +3444,6 @@ void lru_gen_migrate_mm(struct mm_struct *mm)
}
#endif
-/*
- * Bloom filters with m=1<<15, k=2 and the false positive rates of ~1/5 when
- * n=10,000 and ~1/2 when n=20,000, where, conventionally, m is the number of
- * bits in a bitmap, k is the number of hash functions and n is the number of
- * inserted items.
- *
- * Page table walkers use one of the two filters to reduce their search space.
- * To get rid of non-leaf entries that no longer have enough leaf entries, the
- * aging uses the double-buffering technique to flip to the other filter each
- * time it produces a new generation. For non-leaf entries that have enough
- * leaf entries, the aging carries them over to the next generation in
- * walk_pmd_range(); the eviction also report them when walking the rmap
- * in lru_gen_look_around().
- *
- * For future optimizations:
- * 1. It's not necessary to keep both filters all the time. The spare one can be
- * freed after the RCU grace period and reallocated if needed again.
- * 2. And when reallocating, it's worth scaling its size according to the number
- * of inserted entries in the other filter, to reduce the memory overhead on
- * small systems and false positives on large systems.
- * 3. Jenkins' hash function is an alternative to Knuth's.
- */
-#define BLOOM_FILTER_SHIFT 15
-
-static inline int filter_gen_from_seq(unsigned long seq)
-{
- return seq % NR_BLOOM_FILTERS;
-}
-
-static void get_item_key(void *item, int *key)
-{
- u32 hash = hash_ptr(item, BLOOM_FILTER_SHIFT * 2);
-
- BUILD_BUG_ON(BLOOM_FILTER_SHIFT * 2 > BITS_PER_TYPE(u32));
-
- key[0] = hash & (BIT(BLOOM_FILTER_SHIFT) - 1);
- key[1] = hash >> BLOOM_FILTER_SHIFT;
-}
-
-static void reset_bloom_filter(struct lruvec *lruvec, unsigned long seq)
-{
- unsigned long *filter;
- int gen = filter_gen_from_seq(seq);
-
- filter = lruvec->mm_state.filters[gen];
- if (filter) {
- bitmap_clear(filter, 0, BIT(BLOOM_FILTER_SHIFT));
- return;
- }
-
- filter = bitmap_zalloc(BIT(BLOOM_FILTER_SHIFT),
- __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
- WRITE_ONCE(lruvec->mm_state.filters[gen], filter);
-}
-
-static void update_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item)
-{
- int key[2];
- unsigned long *filter;
- int gen = filter_gen_from_seq(seq);
-
- filter = READ_ONCE(lruvec->mm_state.filters[gen]);
- if (!filter)
- return;
-
- get_item_key(item, key);
-
- if (!test_bit(key[0], filter))
- set_bit(key[0], filter);
- if (!test_bit(key[1], filter))
- set_bit(key[1], filter);
-}
-
-static bool test_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item)
-{
- int key[2];
- unsigned long *filter;
- int gen = filter_gen_from_seq(seq);
-
- filter = READ_ONCE(lruvec->mm_state.filters[gen]);
- if (!filter)
- return true;
-
- get_item_key(item, key);
-
- return test_bit(key[0], filter) && test_bit(key[1], filter);
-}
-
static void reset_mm_stats(struct lruvec *lruvec, struct lru_gen_mm_walk *walk, bool last)
{
int i;
--
2.39.0.314.g84b9a713c41-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH mm-unstable v1 4/7] mm: multi-gen LRU: section for memcg LRU
2023-01-18 0:18 [PATCH mm-unstable v1 0/7] mm: multi-gen LRU: improve T.J. Alumbaugh
` (2 preceding siblings ...)
2023-01-18 0:18 ` [PATCH mm-unstable v1 3/7] mm: multi-gen LRU: section for Bloom filters T.J. Alumbaugh
@ 2023-01-18 0:18 ` T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 5/7] mm: multi-gen LRU: improve lru_gen_exit_memcg() T.J. Alumbaugh
` (3 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: T.J. Alumbaugh @ 2023-01-18 0:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, linux-mm, T.J. Alumbaugh
Move memcg LRU code into a dedicated section. Improve the design
doc to outline its architecture.
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
---
Documentation/mm/multigen_lru.rst | 33 +++-
include/linux/mm_inline.h | 17 --
include/linux/mmzone.h | 13 +-
mm/memcontrol.c | 8 +-
mm/vmscan.c | 250 +++++++++++++++++-------------
5 files changed, 178 insertions(+), 143 deletions(-)
diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_lru.rst
index 770b5d539856..5f1f6ecbb79b 100644
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -186,9 +186,40 @@ is false positive, the cost is an additional scan of a range of PTEs,
which may yield hot pages anyway. Parameters of the filter itself can
control the false positive rate in the limit.
+Memcg LRU
+---------
+An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
+since each node and memcg combination has an LRU of folios (see
+``mem_cgroup_lruvec()``). Its goal is to improve the scalability of
+global reclaim, which is critical to system-wide memory overcommit in
+data centers. Note that memcg LRU only applies to global reclaim.
+
+The basic structure of an memcg LRU can be understood by an analogy to
+the active/inactive LRU (of folios):
+
+1. It has the young and the old (generations), i.e., the counterparts
+ to the active and the inactive;
+2. The increment of ``max_seq`` triggers promotion, i.e., the
+ counterpart to activation;
+3. Other events trigger similar operations, e.g., offlining an memcg
+ triggers demotion, i.e., the counterpart to deactivation.
+
+In terms of global reclaim, it has two distinct features:
+
+1. Sharding, which allows each thread to start at a random memcg (in
+ the old generation) and improves parallelism;
+2. Eventual fairness, which allows direct reclaim to bail out at will
+ and reduces latency without affecting fairness over some time.
+
+In terms of traversing memcgs during global reclaim, it improves the
+best-case complexity from O(n) to O(1) and does not affect the
+worst-case complexity O(n). Therefore, on average, it has a sublinear
+complexity.
+
Summary
-------
-The multi-gen LRU can be disassembled into the following parts:
+The multi-gen LRU (of folios) can be disassembled into the following
+parts:
* Generations
* Rmap walks
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 26dcbda07e92..de1e622dd366 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -122,18 +122,6 @@ static inline bool lru_gen_in_fault(void)
return current->in_lru_fault;
}
-#ifdef CONFIG_MEMCG
-static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
-{
- return READ_ONCE(lruvec->lrugen.seg);
-}
-#else
-static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
-{
- return 0;
-}
-#endif
-
static inline int lru_gen_from_seq(unsigned long seq)
{
return seq % MAX_NR_GENS;
@@ -309,11 +297,6 @@ static inline bool lru_gen_in_fault(void)
return false;
}
-static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
-{
- return 0;
-}
-
static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
{
return false;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ae7d4e92c12d..c54964979ccf 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -368,15 +368,6 @@ struct page_vma_mapped_walk;
#define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
#define LRU_REFS_MASK ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
-/* see the comment on MEMCG_NR_GENS */
-enum {
- MEMCG_LRU_NOP,
- MEMCG_LRU_HEAD,
- MEMCG_LRU_TAIL,
- MEMCG_LRU_OLD,
- MEMCG_LRU_YOUNG,
-};
-
#ifdef CONFIG_LRU_GEN
enum {
@@ -557,7 +548,7 @@ void lru_gen_exit_memcg(struct mem_cgroup *memcg);
void lru_gen_online_memcg(struct mem_cgroup *memcg);
void lru_gen_offline_memcg(struct mem_cgroup *memcg);
void lru_gen_release_memcg(struct mem_cgroup *memcg);
-void lru_gen_rotate_memcg(struct lruvec *lruvec, int op);
+void lru_gen_soft_reclaim(struct lruvec *lruvec);
#else /* !CONFIG_MEMCG */
@@ -608,7 +599,7 @@ static inline void lru_gen_release_memcg(struct mem_cgroup *memcg)
{
}
-static inline void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
+static inline void lru_gen_soft_reclaim(struct lruvec *lruvec)
{
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 893427aded01..17335459d8dc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -476,12 +476,8 @@ static void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid)
struct mem_cgroup_tree_per_node *mctz;
if (lru_gen_enabled()) {
- struct lruvec *lruvec = &memcg->nodeinfo[nid]->lruvec;
-
- /* see the comment on MEMCG_NR_GENS */
- if (soft_limit_excess(memcg) && lru_gen_memcg_seg(lruvec) != MEMCG_LRU_HEAD)
- lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD);
-
+ if (soft_limit_excess(memcg))
+ lru_gen_soft_reclaim(&memcg->nodeinfo[nid]->lruvec);
return;
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1be9120349f8..796d4ca65e97 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4705,6 +4705,148 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
mem_cgroup_unlock_pages();
}
+/******************************************************************************
+ * memcg LRU
+ ******************************************************************************/
+
+/* see the comment on MEMCG_NR_GENS */
+enum {
+ MEMCG_LRU_NOP,
+ MEMCG_LRU_HEAD,
+ MEMCG_LRU_TAIL,
+ MEMCG_LRU_OLD,
+ MEMCG_LRU_YOUNG,
+};
+
+#ifdef CONFIG_MEMCG
+
+static int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+ return READ_ONCE(lruvec->lrugen.seg);
+}
+
+static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
+{
+ int seg;
+ int old, new;
+ int bin = get_random_u32_below(MEMCG_NR_BINS);
+ struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ seg = 0;
+ new = old = lruvec->lrugen.gen;
+
+ /* see the comment on MEMCG_NR_GENS */
+ if (op == MEMCG_LRU_HEAD)
+ seg = MEMCG_LRU_HEAD;
+ else if (op == MEMCG_LRU_TAIL)
+ seg = MEMCG_LRU_TAIL;
+ else if (op == MEMCG_LRU_OLD)
+ new = get_memcg_gen(pgdat->memcg_lru.seq);
+ else if (op == MEMCG_LRU_YOUNG)
+ new = get_memcg_gen(pgdat->memcg_lru.seq + 1);
+ else
+ VM_WARN_ON_ONCE(true);
+
+ hlist_nulls_del_rcu(&lruvec->lrugen.list);
+
+ if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD)
+ hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
+ else
+ hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
+
+ pgdat->memcg_lru.nr_memcgs[old]--;
+ pgdat->memcg_lru.nr_memcgs[new]++;
+
+ lruvec->lrugen.gen = new;
+ WRITE_ONCE(lruvec->lrugen.seg, seg);
+
+ if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq))
+ WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+ spin_unlock(&pgdat->memcg_lru.lock);
+}
+
+void lru_gen_online_memcg(struct mem_cgroup *memcg)
+{
+ int gen;
+ int nid;
+ int bin = get_random_u32_below(MEMCG_NR_BINS);
+
+ for_each_node(nid) {
+ struct pglist_data *pgdat = NODE_DATA(nid);
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ gen = get_memcg_gen(pgdat->memcg_lru.seq);
+
+ hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]);
+ pgdat->memcg_lru.nr_memcgs[gen]++;
+
+ lruvec->lrugen.gen = gen;
+
+ spin_unlock(&pgdat->memcg_lru.lock);
+ }
+}
+
+void lru_gen_offline_memcg(struct mem_cgroup *memcg)
+{
+ int nid;
+
+ for_each_node(nid) {
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD);
+ }
+}
+
+void lru_gen_release_memcg(struct mem_cgroup *memcg)
+{
+ int gen;
+ int nid;
+
+ for_each_node(nid) {
+ struct pglist_data *pgdat = NODE_DATA(nid);
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ gen = lruvec->lrugen.gen;
+
+ hlist_nulls_del_rcu(&lruvec->lrugen.list);
+ pgdat->memcg_lru.nr_memcgs[gen]--;
+
+ if (!pgdat->memcg_lru.nr_memcgs[gen] && gen == get_memcg_gen(pgdat->memcg_lru.seq))
+ WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+ spin_unlock(&pgdat->memcg_lru.lock);
+ }
+}
+
+void lru_gen_soft_reclaim(struct lruvec *lruvec)
+{
+ /* see the comment on MEMCG_NR_GENS */
+ if (lru_gen_memcg_seg(lruvec) != MEMCG_LRU_HEAD)
+ lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD);
+}
+
+#else /* !CONFIG_MEMCG */
+
+static int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+ return 0;
+}
+
+#endif
+
/******************************************************************************
* the eviction
******************************************************************************/
@@ -5397,53 +5539,6 @@ static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *
pgdat->kswapd_failures = 0;
}
-#ifdef CONFIG_MEMCG
-void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
-{
- int seg;
- int old, new;
- int bin = get_random_u32_below(MEMCG_NR_BINS);
- struct pglist_data *pgdat = lruvec_pgdat(lruvec);
-
- spin_lock(&pgdat->memcg_lru.lock);
-
- VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
-
- seg = 0;
- new = old = lruvec->lrugen.gen;
-
- /* see the comment on MEMCG_NR_GENS */
- if (op == MEMCG_LRU_HEAD)
- seg = MEMCG_LRU_HEAD;
- else if (op == MEMCG_LRU_TAIL)
- seg = MEMCG_LRU_TAIL;
- else if (op == MEMCG_LRU_OLD)
- new = get_memcg_gen(pgdat->memcg_lru.seq);
- else if (op == MEMCG_LRU_YOUNG)
- new = get_memcg_gen(pgdat->memcg_lru.seq + 1);
- else
- VM_WARN_ON_ONCE(true);
-
- hlist_nulls_del_rcu(&lruvec->lrugen.list);
-
- if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD)
- hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
- else
- hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
-
- pgdat->memcg_lru.nr_memcgs[old]--;
- pgdat->memcg_lru.nr_memcgs[new]++;
-
- lruvec->lrugen.gen = new;
- WRITE_ONCE(lruvec->lrugen.seg, seg);
-
- if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq))
- WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
-
- spin_unlock(&pgdat->memcg_lru.lock);
-}
-#endif
-
/******************************************************************************
* state change
******************************************************************************/
@@ -6086,67 +6181,6 @@ void lru_gen_exit_memcg(struct mem_cgroup *memcg)
}
}
-void lru_gen_online_memcg(struct mem_cgroup *memcg)
-{
- int gen;
- int nid;
- int bin = get_random_u32_below(MEMCG_NR_BINS);
-
- for_each_node(nid) {
- struct pglist_data *pgdat = NODE_DATA(nid);
- struct lruvec *lruvec = get_lruvec(memcg, nid);
-
- spin_lock(&pgdat->memcg_lru.lock);
-
- VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list));
-
- gen = get_memcg_gen(pgdat->memcg_lru.seq);
-
- hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]);
- pgdat->memcg_lru.nr_memcgs[gen]++;
-
- lruvec->lrugen.gen = gen;
-
- spin_unlock(&pgdat->memcg_lru.lock);
- }
-}
-
-void lru_gen_offline_memcg(struct mem_cgroup *memcg)
-{
- int nid;
-
- for_each_node(nid) {
- struct lruvec *lruvec = get_lruvec(memcg, nid);
-
- lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD);
- }
-}
-
-void lru_gen_release_memcg(struct mem_cgroup *memcg)
-{
- int gen;
- int nid;
-
- for_each_node(nid) {
- struct pglist_data *pgdat = NODE_DATA(nid);
- struct lruvec *lruvec = get_lruvec(memcg, nid);
-
- spin_lock(&pgdat->memcg_lru.lock);
-
- VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
-
- gen = lruvec->lrugen.gen;
-
- hlist_nulls_del_rcu(&lruvec->lrugen.list);
- pgdat->memcg_lru.nr_memcgs[gen]--;
-
- if (!pgdat->memcg_lru.nr_memcgs[gen] && gen == get_memcg_gen(pgdat->memcg_lru.seq))
- WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
-
- spin_unlock(&pgdat->memcg_lru.lock);
- }
-}
-
#endif /* CONFIG_MEMCG */
static int __init init_lru_gen(void)
--
2.39.0.314.g84b9a713c41-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH mm-unstable v1 5/7] mm: multi-gen LRU: improve lru_gen_exit_memcg()
2023-01-18 0:18 [PATCH mm-unstable v1 0/7] mm: multi-gen LRU: improve T.J. Alumbaugh
` (3 preceding siblings ...)
2023-01-18 0:18 ` [PATCH mm-unstable v1 4/7] mm: multi-gen LRU: section for memcg LRU T.J. Alumbaugh
@ 2023-01-18 0:18 ` T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 6/7] mm: multi-gen LRU: improve walk_pmd_range() T.J. Alumbaugh
` (2 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: T.J. Alumbaugh @ 2023-01-18 0:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, linux-mm, T.J. Alumbaugh
Add warnings and poison ->next.
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
---
mm/vmscan.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 796d4ca65e97..c2e6ad53447b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -6168,12 +6168,17 @@ void lru_gen_exit_memcg(struct mem_cgroup *memcg)
int i;
int nid;
+ VM_WARN_ON_ONCE(!list_empty(&memcg->mm_list.fifo));
+
for_each_node(nid) {
struct lruvec *lruvec = get_lruvec(memcg, nid);
+ VM_WARN_ON_ONCE(lruvec->mm_state.nr_walkers);
VM_WARN_ON_ONCE(memchr_inv(lruvec->lrugen.nr_pages, 0,
sizeof(lruvec->lrugen.nr_pages)));
+ lruvec->lrugen.list.next = LIST_POISON1;
+
for (i = 0; i < NR_BLOOM_FILTERS; i++) {
bitmap_free(lruvec->mm_state.filters[i]);
lruvec->mm_state.filters[i] = NULL;
--
2.39.0.314.g84b9a713c41-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH mm-unstable v1 6/7] mm: multi-gen LRU: improve walk_pmd_range()
2023-01-18 0:18 [PATCH mm-unstable v1 0/7] mm: multi-gen LRU: improve T.J. Alumbaugh
` (4 preceding siblings ...)
2023-01-18 0:18 ` [PATCH mm-unstable v1 5/7] mm: multi-gen LRU: improve lru_gen_exit_memcg() T.J. Alumbaugh
@ 2023-01-18 0:18 ` T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 7/7] mm: multi-gen LRU: simplify lru_gen_look_around() T.J. Alumbaugh
2023-01-25 21:16 ` [PATCH mm-unstable v1 0/7] mm: multi-gen LRU: improve Yu Zhao
7 siblings, 0 replies; 10+ messages in thread
From: T.J. Alumbaugh @ 2023-01-18 0:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, linux-mm, T.J. Alumbaugh
Improve readability of walk_pmd_range() and walk_pmd_range_locked().
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
---
mm/vmscan.c | 40 ++++++++++++++++++++--------------------
1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c2e6ad53447b..ff3b4aa3c31f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3999,8 +3999,8 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
}
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG)
-static void walk_pmd_range_locked(pud_t *pud, unsigned long next, struct vm_area_struct *vma,
- struct mm_walk *args, unsigned long *bitmap, unsigned long *start)
+static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area_struct *vma,
+ struct mm_walk *args, unsigned long *bitmap, unsigned long *first)
{
int i;
pmd_t *pmd;
@@ -4013,18 +4013,19 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long next, struct vm_area
VM_WARN_ON_ONCE(pud_leaf(*pud));
/* try to batch at most 1+MIN_LRU_BATCH+1 entries */
- if (*start == -1) {
- *start = next;
+ if (*first == -1) {
+ *first = addr;
+ bitmap_zero(bitmap, MIN_LRU_BATCH);
return;
}
- i = next == -1 ? 0 : pmd_index(next) - pmd_index(*start);
+ i = addr == -1 ? 0 : pmd_index(addr) - pmd_index(*first);
if (i && i <= MIN_LRU_BATCH) {
__set_bit(i - 1, bitmap);
return;
}
- pmd = pmd_offset(pud, *start);
+ pmd = pmd_offset(pud, *first);
ptl = pmd_lockptr(args->mm, pmd);
if (!spin_trylock(ptl))
@@ -4035,15 +4036,16 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long next, struct vm_area
do {
unsigned long pfn;
struct folio *folio;
- unsigned long addr = i ? (*start & PMD_MASK) + i * PMD_SIZE : *start;
+
+ /* don't round down the first address */
+ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first;
pfn = get_pmd_pfn(pmd[i], vma, addr);
if (pfn == -1)
goto next;
if (!pmd_trans_huge(pmd[i])) {
- if (arch_has_hw_nonleaf_pmd_young() &&
- get_cap(LRU_GEN_NONLEAF_YOUNG))
+ if (arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG))
pmdp_test_and_clear_young(vma, addr, pmd + i);
goto next;
}
@@ -4072,12 +4074,11 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long next, struct vm_area
arch_leave_lazy_mmu_mode();
spin_unlock(ptl);
done:
- *start = -1;
- bitmap_zero(bitmap, MIN_LRU_BATCH);
+ *first = -1;
}
#else
-static void walk_pmd_range_locked(pud_t *pud, unsigned long next, struct vm_area_struct *vma,
- struct mm_walk *args, unsigned long *bitmap, unsigned long *start)
+static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area_struct *vma,
+ struct mm_walk *args, unsigned long *bitmap, unsigned long *first)
{
}
#endif
@@ -4090,9 +4091,9 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
unsigned long next;
unsigned long addr;
struct vm_area_struct *vma;
- unsigned long pos = -1;
+ unsigned long bitmap[BITS_TO_LONGS(MIN_LRU_BATCH)];
+ unsigned long first = -1;
struct lru_gen_mm_walk *walk = args->private;
- unsigned long bitmap[BITS_TO_LONGS(MIN_LRU_BATCH)] = {};
VM_WARN_ON_ONCE(pud_leaf(*pud));
@@ -4131,18 +4132,17 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat))
continue;
- walk_pmd_range_locked(pud, addr, vma, args, bitmap, &pos);
+ walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first);
continue;
}
#endif
walk->mm_stats[MM_NONLEAF_TOTAL]++;
- if (arch_has_hw_nonleaf_pmd_young() &&
- get_cap(LRU_GEN_NONLEAF_YOUNG)) {
+ if (arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG)) {
if (!pmd_young(val))
continue;
- walk_pmd_range_locked(pud, addr, vma, args, bitmap, &pos);
+ walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first);
}
if (!walk->force_scan && !test_bloom_filter(walk->lruvec, walk->max_seq, pmd + i))
@@ -4159,7 +4159,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
update_bloom_filter(walk->lruvec, walk->max_seq + 1, pmd + i);
}
- walk_pmd_range_locked(pud, -1, vma, args, bitmap, &pos);
+ walk_pmd_range_locked(pud, -1, vma, args, bitmap, &first);
if (i < PTRS_PER_PMD && get_next_vma(PUD_MASK, PMD_SIZE, args, &start, &end))
goto restart;
--
2.39.0.314.g84b9a713c41-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH mm-unstable v1 7/7] mm: multi-gen LRU: simplify lru_gen_look_around()
2023-01-18 0:18 [PATCH mm-unstable v1 0/7] mm: multi-gen LRU: improve T.J. Alumbaugh
` (5 preceding siblings ...)
2023-01-18 0:18 ` [PATCH mm-unstable v1 6/7] mm: multi-gen LRU: improve walk_pmd_range() T.J. Alumbaugh
@ 2023-01-18 0:18 ` T.J. Alumbaugh
2023-01-25 21:16 ` [PATCH mm-unstable v1 0/7] mm: multi-gen LRU: improve Yu Zhao
7 siblings, 0 replies; 10+ messages in thread
From: T.J. Alumbaugh @ 2023-01-18 0:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, linux-mm, T.J. Alumbaugh
Update the folio generation in place with or without
current->reclaim_state->mm_walk. The LRU lock is held for longer, if
mm_walk is NULL and the number of folios to update is more than
PAGEVEC_SIZE.
This causes a measurable regression from the LRU lock contention
during a microbencmark. But a tiny regression is not worth the
complexity.
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
---
mm/vmscan.c | 73 +++++++++++++++++------------------------------------
1 file changed, 23 insertions(+), 50 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ff3b4aa3c31f..ac51150d2d36 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4587,13 +4587,12 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc)
void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
{
int i;
- pte_t *pte;
unsigned long start;
unsigned long end;
- unsigned long addr;
struct lru_gen_mm_walk *walk;
int young = 0;
- unsigned long bitmap[BITS_TO_LONGS(MIN_LRU_BATCH)] = {};
+ pte_t *pte = pvmw->pte;
+ unsigned long addr = pvmw->address;
struct folio *folio = pfn_folio(pvmw->pfn);
struct mem_cgroup *memcg = folio_memcg(folio);
struct pglist_data *pgdat = folio_pgdat(folio);
@@ -4610,25 +4609,28 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
/* avoid taking the LRU lock under the PTL when possible */
walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL;
- start = max(pvmw->address & PMD_MASK, pvmw->vma->vm_start);
- end = min(pvmw->address | ~PMD_MASK, pvmw->vma->vm_end - 1) + 1;
+ start = max(addr & PMD_MASK, pvmw->vma->vm_start);
+ end = min(addr | ~PMD_MASK, pvmw->vma->vm_end - 1) + 1;
if (end - start > MIN_LRU_BATCH * PAGE_SIZE) {
- if (pvmw->address - start < MIN_LRU_BATCH * PAGE_SIZE / 2)
+ if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2)
end = start + MIN_LRU_BATCH * PAGE_SIZE;
- else if (end - pvmw->address < MIN_LRU_BATCH * PAGE_SIZE / 2)
+ else if (end - addr < MIN_LRU_BATCH * PAGE_SIZE / 2)
start = end - MIN_LRU_BATCH * PAGE_SIZE;
else {
- start = pvmw->address - MIN_LRU_BATCH * PAGE_SIZE / 2;
- end = pvmw->address + MIN_LRU_BATCH * PAGE_SIZE / 2;
+ start = addr - MIN_LRU_BATCH * PAGE_SIZE / 2;
+ end = addr + MIN_LRU_BATCH * PAGE_SIZE / 2;
}
}
- pte = pvmw->pte - (pvmw->address - start) / PAGE_SIZE;
+ /* folio_update_gen() requires stable folio_memcg() */
+ if (!mem_cgroup_trylock_pages(memcg))
+ return;
- rcu_read_lock();
arch_enter_lazy_mmu_mode();
+ pte -= (addr - start) / PAGE_SIZE;
+
for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) {
unsigned long pfn;
@@ -4653,56 +4655,27 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
!folio_test_swapcache(folio)))
folio_mark_dirty(folio);
+ if (walk) {
+ old_gen = folio_update_gen(folio, new_gen);
+ if (old_gen >= 0 && old_gen != new_gen)
+ update_batch_size(walk, folio, old_gen, new_gen);
+
+ continue;
+ }
+
old_gen = folio_lru_gen(folio);
if (old_gen < 0)
folio_set_referenced(folio);
else if (old_gen != new_gen)
- __set_bit(i, bitmap);
+ folio_activate(folio);
}
arch_leave_lazy_mmu_mode();
- rcu_read_unlock();
+ mem_cgroup_unlock_pages();
/* feedback from rmap walkers to page table walkers */
if (suitable_to_scan(i, young))
update_bloom_filter(lruvec, max_seq, pvmw->pmd);
-
- if (!walk && bitmap_weight(bitmap, MIN_LRU_BATCH) < PAGEVEC_SIZE) {
- for_each_set_bit(i, bitmap, MIN_LRU_BATCH) {
- folio = pfn_folio(pte_pfn(pte[i]));
- folio_activate(folio);
- }
- return;
- }
-
- /* folio_update_gen() requires stable folio_memcg() */
- if (!mem_cgroup_trylock_pages(memcg))
- return;
-
- if (!walk) {
- spin_lock_irq(&lruvec->lru_lock);
- new_gen = lru_gen_from_seq(lruvec->lrugen.max_seq);
- }
-
- for_each_set_bit(i, bitmap, MIN_LRU_BATCH) {
- folio = pfn_folio(pte_pfn(pte[i]));
- if (folio_memcg_rcu(folio) != memcg)
- continue;
-
- old_gen = folio_update_gen(folio, new_gen);
- if (old_gen < 0 || old_gen == new_gen)
- continue;
-
- if (walk)
- update_batch_size(walk, folio, old_gen, new_gen);
- else
- lru_gen_update_size(lruvec, folio, old_gen, new_gen);
- }
-
- if (!walk)
- spin_unlock_irq(&lruvec->lru_lock);
-
- mem_cgroup_unlock_pages();
}
/******************************************************************************
--
2.39.0.314.g84b9a713c41-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH mm-unstable v1 1/7] mm: multi-gen LRU: section for working set protection
2023-01-18 0:18 ` [PATCH mm-unstable v1 1/7] mm: multi-gen LRU: section for working set protection T.J. Alumbaugh
@ 2023-01-18 1:00 ` Matthew Wilcox
0 siblings, 0 replies; 10+ messages in thread
From: Matthew Wilcox @ 2023-01-18 1:00 UTC (permalink / raw)
To: T.J. Alumbaugh; +Cc: Andrew Morton, linux-mm, linux-kernel, linux-mm
On Wed, Jan 18, 2023 at 12:18:21AM +0000, T.J. Alumbaugh wrote:
> +++ b/mm/vmscan.c
> @@ -4475,6 +4475,10 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
> return true;
> }
>
> +/******************************************************************************
> + * working set protection
> + ******************************************************************************/
> +
> static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
We don't usually do this. Why do you want to?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH mm-unstable v1 0/7] mm: multi-gen LRU: improve
2023-01-18 0:18 [PATCH mm-unstable v1 0/7] mm: multi-gen LRU: improve T.J. Alumbaugh
` (6 preceding siblings ...)
2023-01-18 0:18 ` [PATCH mm-unstable v1 7/7] mm: multi-gen LRU: simplify lru_gen_look_around() T.J. Alumbaugh
@ 2023-01-25 21:16 ` Yu Zhao
7 siblings, 0 replies; 10+ messages in thread
From: Yu Zhao @ 2023-01-25 21:16 UTC (permalink / raw)
To: T.J. Alumbaugh; +Cc: Andrew Morton, linux-mm, linux-kernel, linux-mm
On Tue, Jan 17, 2023 at 5:18 PM T.J. Alumbaugh <talumbau@google.com> wrote:
>
> This patch series improves a few MGLRU functions, collects related
> functions, and adds additional documentation.
>
> T.J. Alumbaugh (7):
> mm: multi-gen LRU: section for working set protection
> mm: multi-gen LRU: section for rmap/PT walk feedback
> mm: multi-gen LRU: section for Bloom filters
> mm: multi-gen LRU: section for memcg LRU
> mm: multi-gen LRU: improve lru_gen_exit_memcg()
> mm: multi-gen LRU: improve walk_pmd_range()
> mm: multi-gen LRU: simplify lru_gen_look_around()
>
> Documentation/mm/multigen_lru.rst | 78 ++++-
> include/linux/mm_inline.h | 17 -
> include/linux/mmzone.h | 13 +-
> mm/memcontrol.c | 8 +-
> mm/vmscan.c | 534 ++++++++++++++++--------------
> 5 files changed, 360 insertions(+), 290 deletions(-)
LGTM, thanks.
The design doc is still missing two sections (PID and mm_struct list),
and someone from our team will follow up on that.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-01-25 21:17 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-18 0:18 [PATCH mm-unstable v1 0/7] mm: multi-gen LRU: improve T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 1/7] mm: multi-gen LRU: section for working set protection T.J. Alumbaugh
2023-01-18 1:00 ` Matthew Wilcox
2023-01-18 0:18 ` [PATCH mm-unstable v1 2/7] mm: multi-gen LRU: section for rmap/PT walk feedback T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 3/7] mm: multi-gen LRU: section for Bloom filters T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 4/7] mm: multi-gen LRU: section for memcg LRU T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 5/7] mm: multi-gen LRU: improve lru_gen_exit_memcg() T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 6/7] mm: multi-gen LRU: improve walk_pmd_range() T.J. Alumbaugh
2023-01-18 0:18 ` [PATCH mm-unstable v1 7/7] mm: multi-gen LRU: simplify lru_gen_look_around() T.J. Alumbaugh
2023-01-25 21:16 ` [PATCH mm-unstable v1 0/7] mm: multi-gen LRU: improve Yu Zhao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).