* [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily
@ 2021-03-02 21:09 Minchan Kim
2021-03-02 21:09 ` [PATCH 2/2] mm: fs: Invalidate BH LRU during page migration Minchan Kim
` (4 more replies)
0 siblings, 5 replies; 13+ messages in thread
From: Minchan Kim @ 2021-03-02 21:09 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, LKML, joaodias, surenb, cgoldswo, willy, mhocko, david,
vbabka, linux-fsdevel, Minchan Kim
LRU pagevec holds refcount of pages until the pagevec are drained.
It could prevent migration since the refcount of the page is greater
than the expection in migration logic. To mitigate the issue,
callers of migrate_pages drains LRU pagevec via migrate_prep or
lru_add_drain_all before migrate_pages call.
However, it's not enough because pages coming into pagevec after the
draining call still could stay at the pagevec so it could keep
preventing page migration. Since some callers of migrate_pages have
retrial logic with LRU draining, the page would migrate at next trail
but it is still fragile in that it doesn't close the fundamental race
between upcoming LRU pages into pagvec and migration so the migration
failure could cause contiguous memory allocation failure in the end.
To close the race, this patch disables lru caches(i.e, pagevec)
during ongoing migration until migrate is done.
Since it's really hard to reproduce, I measured how many times
migrate_pages retried with force mode below debug code.
int migrate_pages(struct list_head *from, new_page_t get_new_page,
..
..
if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc);
dump_page(page, "fail to migrate");
}
The test was repeating android apps launching with cma allocation
in background every five seconds. Total cma allocation count was
about 500 during the testing. With this patch, the dump_page count
was reduced from 400 to 30.
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
* from RFC - http://lore.kernel.org/linux-mm/20210216170348.1513483-1-minchan@kernel.org
* use atomic and lru_add_drain_all for strict ordering - mhocko
* lru_cache_disable/enable - mhocko
fs/block_dev.c | 2 +-
include/linux/migrate.h | 6 +++--
include/linux/swap.h | 4 ++-
mm/compaction.c | 4 +--
mm/fadvise.c | 2 +-
mm/gup.c | 2 +-
mm/khugepaged.c | 2 +-
mm/ksm.c | 2 +-
mm/memcontrol.c | 4 +--
mm/memfd.c | 2 +-
mm/memory-failure.c | 2 +-
mm/memory_hotplug.c | 2 +-
mm/mempolicy.c | 6 +++++
mm/migrate.c | 15 ++++++-----
mm/page_alloc.c | 5 +++-
mm/swap.c | 55 +++++++++++++++++++++++++++++++++++------
16 files changed, 85 insertions(+), 30 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7a814a13f9a4..1fe75dbd0ce0 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -93,7 +93,7 @@ void invalidate_bdev(struct block_device *bdev)
if (mapping->nrpages) {
invalidate_bh_lrus();
- lru_add_drain_all(); /* make sure all lru add caches are flushed */
+ lru_add_drain_all(false); /* make sure all lru add caches are flushed */
invalidate_mapping_pages(mapping, 0, -1);
}
/* 99% of the time, we don't need to flush the cleancache on the bdev.
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3a389633b68f..6a23174ea081 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -47,6 +47,7 @@ extern void putback_movable_page(struct page *page);
extern void migrate_prep(void);
extern void migrate_prep_local(void);
+extern void migrate_finish(void);
extern void migrate_page_states(struct page *newpage, struct page *page);
extern void migrate_page_copy(struct page *newpage, struct page *page);
extern int migrate_huge_page_move_mapping(struct address_space *mapping,
@@ -66,8 +67,9 @@ static inline struct page *alloc_migration_target(struct page *page,
static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
-static inline int migrate_prep(void) { return -ENOSYS; }
-static inline int migrate_prep_local(void) { return -ENOSYS; }
+static inline void migrate_prep(void) { return -ENOSYS; }
+static inline void migrate_prep_local(void) { return -ENOSYS; }
+static inline void migrate_done(void) {}
static inline void migrate_page_states(struct page *newpage, struct page *page)
{
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 71166bc10d17..8ab7ad7157f3 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -339,10 +339,12 @@ extern void lru_note_cost(struct lruvec *lruvec, bool file,
extern void lru_note_cost_page(struct page *);
extern void lru_cache_add(struct page *);
extern void mark_page_accessed(struct page *);
+extern void lru_cache_disable(void);
+extern void lru_cache_enable(void);
extern void lru_add_drain(void);
extern void lru_add_drain_cpu(int cpu);
extern void lru_add_drain_cpu_zone(struct zone *zone);
-extern void lru_add_drain_all(void);
+extern void lru_add_drain_all(bool force_all_cpus);
extern void rotate_reclaimable_page(struct page *page);
extern void deactivate_file_page(struct page *page);
extern void deactivate_page(struct page *page);
diff --git a/mm/compaction.c b/mm/compaction.c
index 50b31e2ec6cb..519f6c241e69 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2648,7 +2648,7 @@ static void compact_nodes(void)
int nid;
/* Flush pending updates to the LRU lists */
- lru_add_drain_all();
+ lru_add_drain_all(false);
for_each_online_node(nid)
compact_node(nid);
@@ -2686,7 +2686,7 @@ static ssize_t sysfs_compact_node(struct device *dev,
if (nid >= 0 && nid < nr_node_ids && node_online(nid)) {
/* Flush pending updates to the LRU lists */
- lru_add_drain_all();
+ lru_add_drain_all(false);
compact_node(nid);
}
diff --git a/mm/fadvise.c b/mm/fadvise.c
index d6baa4f451c5..6053cd878b18 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -165,7 +165,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
* pagevecs and try again.
*/
if (nr_pagevec) {
- lru_add_drain_all();
+ lru_add_drain_all(false);
invalidate_mapping_pages(mapping, start_index,
end_index);
}
diff --git a/mm/gup.c b/mm/gup.c
index 3e086b073624..2d51edd1601b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1586,7 +1586,7 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
isolate_huge_page(head, &cma_page_list);
else {
if (!PageLRU(head) && drain_allow) {
- lru_add_drain_all();
+ lru_add_drain_all(false);
drain_allow = false;
}
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 8369d9620f6d..65d08a35be08 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2211,7 +2211,7 @@ static void khugepaged_do_scan(void)
barrier(); /* write khugepaged_pages_to_scan to local stack */
- lru_add_drain_all();
+ lru_add_drain_all(false);
while (progress < pages) {
if (!khugepaged_prealloc_page(&hpage, &wait))
diff --git a/mm/ksm.c b/mm/ksm.c
index 9694ee2c71de..3559e2c92ebf 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2245,7 +2245,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
* them here (here rather than on entry to ksm_do_scan(),
* so we don't IPI too often when pages_to_scan is set low).
*/
- lru_add_drain_all();
+ lru_add_drain_all(false);
/*
* Whereas stale stable_nodes on the stable_tree itself
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ed5cc78a8dbf..12f70487915e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3520,7 +3520,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
int nr_retries = MAX_RECLAIM_RETRIES;
/* we call try-to-free pages for make this cgroup empty */
- lru_add_drain_all();
+ lru_add_drain_all(false);
drain_all_stock(memcg);
@@ -6140,7 +6140,7 @@ static const struct mm_walk_ops charge_walk_ops = {
static void mem_cgroup_move_charge(void)
{
- lru_add_drain_all();
+ lru_add_drain_all(false);
/*
* Signal lock_page_memcg() to take the memcg's move_lock
* while we're moving its pages to another memcg. Then wait
diff --git a/mm/memfd.c b/mm/memfd.c
index 2647c898990c..79d6e004a89b 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -79,7 +79,7 @@ static int memfd_wait_for_pins(struct address_space *mapping)
break;
if (!scan)
- lru_add_drain_all();
+ lru_add_drain_all(false);
else if (schedule_timeout_killable((HZ << scan) / 200))
scan = LAST_SCAN;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 4e3684d694c1..aba281dc58d7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -272,7 +272,7 @@ void shake_page(struct page *p, int access)
return;
if (!PageSlab(p)) {
- lru_add_drain_all();
+ lru_add_drain_all(false);
if (PageLRU(p) || is_free_buddy_page(p))
return;
}
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a969463bdda4..56d224e8e7f6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1602,7 +1602,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
}
cond_resched();
- lru_add_drain_all();
+ lru_add_drain_all(false);
ret = scan_movable_pages(pfn, end_pfn, &pfn);
if (!ret) {
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 6961238c7ef5..46d9986c7bf0 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1208,6 +1208,8 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
break;
}
mmap_read_unlock(mm);
+ migrate_finish();
+
if (err < 0)
return err;
return busy;
@@ -1371,6 +1373,10 @@ static long do_mbind(unsigned long start, unsigned long len,
mmap_write_unlock(mm);
mpol_out:
mpol_put(new);
+
+ if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
+ migrate_finish();
+
return err;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index a69da8aaeccd..bcf4637f6950 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -65,12 +65,9 @@
void migrate_prep(void)
{
/*
- * Clear the LRU lists so pages can be isolated.
- * Note that pages may be moved off the LRU after we have
- * drained them. Those pages will fail to migrate like other
- * pages that may be busy.
+ * Clear the LRU pcp lists so pages can be isolated.
*/
- lru_add_drain_all();
+ lru_cache_disable();
}
/* Do the necessary work of migrate_prep but not if it involves other CPUs */
@@ -79,6 +76,11 @@ void migrate_prep_local(void)
lru_add_drain();
}
+void migrate_finish(void)
+{
+ lru_cache_enable();
+}
+
int isolate_movable_page(struct page *page, isolate_mode_t mode)
{
struct address_space *mapping;
@@ -1837,6 +1839,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
if (err >= 0)
err = err1;
out:
+ migrate_finish();
return err;
}
@@ -2673,7 +2676,7 @@ static void migrate_vma_prepare(struct migrate_vma *migrate)
if (!is_zone_device_page(page)) {
if (!PageLRU(page) && allow_drain) {
/* Drain CPU's pagevec */
- lru_add_drain_all();
+ lru_add_drain_all(false);
allow_drain = false;
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6446778cbc6b..9214fdada691 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8493,6 +8493,9 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
ret = migrate_pages(&cc->migratepages, alloc_migration_target,
NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
}
+
+ migrate_finish();
+
if (ret < 0) {
putback_movable_pages(&cc->migratepages);
return ret;
@@ -8603,7 +8606,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
* isolated thus they won't get removed from buddy.
*/
- lru_add_drain_all();
+ lru_add_drain_all(false);
order = 0;
outer_start = start;
diff --git a/mm/swap.c b/mm/swap.c
index 31b844d4ed94..c1fa6cac04c1 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -54,6 +54,32 @@ static DEFINE_PER_CPU(struct lru_rotate, lru_rotate) = {
.lock = INIT_LOCAL_LOCK(lock),
};
+static atomic_t lru_disable_count = ATOMIC_INIT(0);
+
+bool lru_cache_disabled(void)
+{
+ return atomic_read(&lru_disable_count);
+}
+
+void lru_cache_disable(void)
+{
+ /*
+ * lru_add_drain_all's IPI will make sure no new pages are added
+ * to the pcp lists and drain them all.
+ */
+ atomic_inc(&lru_disable_count);
+
+ /*
+ * Clear the LRU lists so pages can be isolated.
+ */
+ lru_add_drain_all(true);
+}
+
+void lru_cache_enable(void)
+{
+ atomic_dec(&lru_disable_count);
+}
+
/*
* The following struct pagevec are grouped together because they are protected
* by disabling preemption (and interrupts remain enabled).
@@ -235,6 +261,18 @@ static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec)
}
}
+/* return true if pagevec needs to drain */
+static bool pagevec_add_and_need_flush(struct pagevec *pvec, struct page *page)
+{
+ bool ret = false;
+
+ if (!pagevec_add(pvec, page) || PageCompound(page) ||
+ lru_cache_disabled())
+ ret = true;
+
+ return ret;
+}
+
/*
* Writeback is about to end against a page which has been marked for immediate
* reclaim. If it still appears to be reclaimable, move it to the tail of the
@@ -252,7 +290,7 @@ void rotate_reclaimable_page(struct page *page)
get_page(page);
local_lock_irqsave(&lru_rotate.lock, flags);
pvec = this_cpu_ptr(&lru_rotate.pvec);
- if (!pagevec_add(pvec, page) || PageCompound(page))
+ if (pagevec_add_and_need_flush(pvec, page))
pagevec_lru_move_fn(pvec, pagevec_move_tail_fn);
local_unlock_irqrestore(&lru_rotate.lock, flags);
}
@@ -343,7 +381,7 @@ static void activate_page(struct page *page)
local_lock(&lru_pvecs.lock);
pvec = this_cpu_ptr(&lru_pvecs.activate_page);
get_page(page);
- if (!pagevec_add(pvec, page) || PageCompound(page))
+ if (pagevec_add_and_need_flush(pvec, page))
pagevec_lru_move_fn(pvec, __activate_page);
local_unlock(&lru_pvecs.lock);
}
@@ -458,7 +496,7 @@ void lru_cache_add(struct page *page)
get_page(page);
local_lock(&lru_pvecs.lock);
pvec = this_cpu_ptr(&lru_pvecs.lru_add);
- if (!pagevec_add(pvec, page) || PageCompound(page))
+ if (pagevec_add_and_need_flush(pvec, page))
__pagevec_lru_add(pvec);
local_unlock(&lru_pvecs.lock);
}
@@ -654,7 +692,7 @@ void deactivate_file_page(struct page *page)
local_lock(&lru_pvecs.lock);
pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate_file);
- if (!pagevec_add(pvec, page) || PageCompound(page))
+ if (pagevec_add_and_need_flush(pvec, page))
pagevec_lru_move_fn(pvec, lru_deactivate_file_fn);
local_unlock(&lru_pvecs.lock);
}
@@ -676,7 +714,7 @@ void deactivate_page(struct page *page)
local_lock(&lru_pvecs.lock);
pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate);
get_page(page);
- if (!pagevec_add(pvec, page) || PageCompound(page))
+ if (pagevec_add_and_need_flush(pvec, page))
pagevec_lru_move_fn(pvec, lru_deactivate_fn);
local_unlock(&lru_pvecs.lock);
}
@@ -698,7 +736,7 @@ void mark_page_lazyfree(struct page *page)
local_lock(&lru_pvecs.lock);
pvec = this_cpu_ptr(&lru_pvecs.lru_lazyfree);
get_page(page);
- if (!pagevec_add(pvec, page) || PageCompound(page))
+ if (pagevec_add_and_need_flush(pvec, page))
pagevec_lru_move_fn(pvec, lru_lazyfree_fn);
local_unlock(&lru_pvecs.lock);
}
@@ -735,7 +773,7 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
* Calling this function with cpu hotplug locks held can actually lead
* to obscure indirect dependencies via WQ context.
*/
-void lru_add_drain_all(void)
+void lru_add_drain_all(bool force_all_cpus)
{
/*
* lru_drain_gen - Global pages generation number
@@ -810,7 +848,8 @@ void lru_add_drain_all(void)
for_each_online_cpu(cpu) {
struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
- if (pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
+ if (force_all_cpus ||
+ pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
data_race(pagevec_count(&per_cpu(lru_rotate.pvec, cpu))) ||
pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) ||
pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) ||
--
2.30.1.766.gb4fecdf3b7-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/2] mm: fs: Invalidate BH LRU during page migration
2021-03-02 21:09 [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily Minchan Kim
@ 2021-03-02 21:09 ` Minchan Kim
2021-03-03 13:38 ` kernel test robot
2021-03-03 15:55 ` kernel test robot
2021-03-03 12:49 ` [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily Michal Hocko
` (3 subsequent siblings)
4 siblings, 2 replies; 13+ messages in thread
From: Minchan Kim @ 2021-03-02 21:09 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, LKML, joaodias, surenb, cgoldswo, willy, mhocko, david,
vbabka, linux-fsdevel, Minchan Kim
Pages containing buffer_heads that are in one of the per-CPU
buffer_head LRU caches will be pinned and thus cannot be migrated.
This can prevent CMA allocations from succeeding, which are often used
on platforms with co-processors (such as a DSP) that can only use
physically contiguous memory. It can also prevent memory
hot-unplugging from succeeding, which involves migrating at least
MIN_MEMORY_BLOCK_SIZE bytes of memory, which ranges from 8 MiB to 1
GiB based on the architecture in use.
Correspondingly, invalidate the BH LRU caches before a migration
starts and stop any buffer_head from being cached in the LRU caches,
until migration has finished.
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
* from prev - https://lore.kernel.org/linux-mm/e8f3e042b902156467a5e978b57c14954213ec59.1611642039.git.cgoldswo@codeaurora.org/
* consolidate bh_lru drain logic into lru pagevec - willy
fs/buffer.c | 12 ++++++++++--
include/linux/buffer_head.h | 2 ++
include/linux/swap.h | 1 +
mm/swap.c | 5 ++++-
4 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 96c7604f69b3..4492e9d4c9d3 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1301,6 +1301,14 @@ static void bh_lru_install(struct buffer_head *bh)
int i;
check_irqs_on();
+ /*
+ * buffer_head in bh_lru could increase refcount of the page
+ * until it will be invalidated. It causes page migraion failure.
+ * Skip putting upcoming bh into bh_lru until migration is done.
+ */
+ if (lru_cache_disabled())
+ return;
+
bh_lru_lock();
b = this_cpu_ptr(&bh_lrus);
@@ -1446,7 +1454,7 @@ EXPORT_SYMBOL(__bread_gfp);
* This doesn't race because it runs in each cpu either in irq
* or with preempt disabled.
*/
-static void invalidate_bh_lru(void *arg)
+void invalidate_bh_lru(void *arg)
{
struct bh_lru *b = &get_cpu_var(bh_lrus);
int i;
@@ -1458,7 +1466,7 @@ static void invalidate_bh_lru(void *arg)
put_cpu_var(bh_lrus);
}
-static bool has_bh_in_lru(int cpu, void *dummy)
+bool has_bh_in_lru(int cpu, void *dummy)
{
struct bh_lru *b = per_cpu_ptr(&bh_lrus, cpu);
int i;
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 6b47f94378c5..3d98bdabaac9 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -194,6 +194,8 @@ void __breadahead_gfp(struct block_device *, sector_t block, unsigned int size,
struct buffer_head *__bread_gfp(struct block_device *,
sector_t block, unsigned size, gfp_t gfp);
void invalidate_bh_lrus(void);
+void invalidate_bh_lru(void *);
+bool has_bh_in_lru(int cpu, void *dummy);
struct buffer_head *alloc_buffer_head(gfp_t gfp_flags);
void free_buffer_head(struct buffer_head * bh);
void unlock_buffer(struct buffer_head *bh);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 8ab7ad7157f3..94a77e618dba 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -341,6 +341,7 @@ extern void lru_cache_add(struct page *);
extern void mark_page_accessed(struct page *);
extern void lru_cache_disable(void);
extern void lru_cache_enable(void);
+extern bool lru_cache_disabled(void);
extern void lru_add_drain(void);
extern void lru_add_drain_cpu(int cpu);
extern void lru_add_drain_cpu_zone(struct zone *zone);
diff --git a/mm/swap.c b/mm/swap.c
index c1fa6cac04c1..88d51b9ebc8c 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -36,6 +36,7 @@
#include <linux/hugetlb.h>
#include <linux/page_idle.h>
#include <linux/local_lock.h>
+#include <linux/buffer_head.h>
#include "internal.h"
@@ -667,6 +668,7 @@ void lru_add_drain_cpu(int cpu)
pagevec_lru_move_fn(pvec, lru_lazyfree_fn);
activate_page_drain(cpu);
+ invalidate_bh_lru(NULL);
}
/**
@@ -854,7 +856,8 @@ void lru_add_drain_all(bool force_all_cpus)
pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) ||
pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) ||
pagevec_count(&per_cpu(lru_pvecs.lru_lazyfree, cpu)) ||
- need_activate_page_drain(cpu)) {
+ need_activate_page_drain(cpu) ||
+ has_bh_in_lru(cpu, NULL)) {
INIT_WORK(work, lru_add_drain_per_cpu);
queue_work_on(cpu, mm_percpu_wq, work);
__cpumask_set_cpu(cpu, &has_work);
--
2.30.1.766.gb4fecdf3b7-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily
2021-03-02 21:09 [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily Minchan Kim
2021-03-02 21:09 ` [PATCH 2/2] mm: fs: Invalidate BH LRU during page migration Minchan Kim
@ 2021-03-03 12:49 ` Michal Hocko
2021-03-03 20:23 ` Minchan Kim
2021-03-03 13:38 ` kernel test robot
` (2 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2021-03-03 12:49 UTC (permalink / raw)
To: Minchan Kim
Cc: Andrew Morton, linux-mm, LKML, joaodias, surenb, cgoldswo, willy,
david, vbabka, linux-fsdevel
On Tue 02-03-21 13:09:48, Minchan Kim wrote:
> LRU pagevec holds refcount of pages until the pagevec are drained.
> It could prevent migration since the refcount of the page is greater
> than the expection in migration logic. To mitigate the issue,
> callers of migrate_pages drains LRU pagevec via migrate_prep or
> lru_add_drain_all before migrate_pages call.
>
> However, it's not enough because pages coming into pagevec after the
> draining call still could stay at the pagevec so it could keep
> preventing page migration. Since some callers of migrate_pages have
> retrial logic with LRU draining, the page would migrate at next trail
> but it is still fragile in that it doesn't close the fundamental race
> between upcoming LRU pages into pagvec and migration so the migration
> failure could cause contiguous memory allocation failure in the end.
>
> To close the race, this patch disables lru caches(i.e, pagevec)
> during ongoing migration until migrate is done.
>
> Since it's really hard to reproduce, I measured how many times
> migrate_pages retried with force mode below debug code.
>
> int migrate_pages(struct list_head *from, new_page_t get_new_page,
> ..
> ..
>
> if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
> printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc);
> dump_page(page, "fail to migrate");
> }
>
> The test was repeating android apps launching with cma allocation
> in background every five seconds. Total cma allocation count was
> about 500 during the testing. With this patch, the dump_page count
> was reduced from 400 to 30.
Have you seen any improvement on the CMA allocation success rate?
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
> * from RFC - http://lore.kernel.org/linux-mm/20210216170348.1513483-1-minchan@kernel.org
> * use atomic and lru_add_drain_all for strict ordering - mhocko
> * lru_cache_disable/enable - mhocko
>
> fs/block_dev.c | 2 +-
> include/linux/migrate.h | 6 +++--
> include/linux/swap.h | 4 ++-
> mm/compaction.c | 4 +--
> mm/fadvise.c | 2 +-
> mm/gup.c | 2 +-
> mm/khugepaged.c | 2 +-
> mm/ksm.c | 2 +-
> mm/memcontrol.c | 4 +--
> mm/memfd.c | 2 +-
> mm/memory-failure.c | 2 +-
> mm/memory_hotplug.c | 2 +-
> mm/mempolicy.c | 6 +++++
> mm/migrate.c | 15 ++++++-----
> mm/page_alloc.c | 5 +++-
> mm/swap.c | 55 +++++++++++++++++++++++++++++++++++------
> 16 files changed, 85 insertions(+), 30 deletions(-)
The churn seems to be quite big for something that should have been a
very small change. Have you considered not changing lru_add_drain_all
but rather introduce __lru_add_dain_all that would implement the
enforced flushing?
[...]
> +static atomic_t lru_disable_count = ATOMIC_INIT(0);
> +
> +bool lru_cache_disabled(void)
> +{
> + return atomic_read(&lru_disable_count);
> +}
> +
> +void lru_cache_disable(void)
> +{
> + /*
> + * lru_add_drain_all's IPI will make sure no new pages are added
> + * to the pcp lists and drain them all.
> + */
> + atomic_inc(&lru_disable_count);
As already mentioned in the last review. The IPI reference is more
cryptic than useful. I would go with something like this instead
/*
* lru_add_drain_all in the force mode will schedule draining on
* all online CPUs so any calls of lru_cache_disabled wrapped by
* local_lock or preemption disabled would be ordered by that.
* The atomic operation doesn't need to have stronger ordering
* requirements because that is enforece by the scheduling
* guarantees.
*/
> +
> + /*
> + * Clear the LRU lists so pages can be isolated.
> + */
> + lru_add_drain_all(true);
> +}
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] mm: fs: Invalidate BH LRU during page migration
2021-03-02 21:09 ` [PATCH 2/2] mm: fs: Invalidate BH LRU during page migration Minchan Kim
@ 2021-03-03 13:38 ` kernel test robot
2021-03-03 15:55 ` kernel test robot
1 sibling, 0 replies; 13+ messages in thread
From: kernel test robot @ 2021-03-03 13:38 UTC (permalink / raw)
To: Minchan Kim, Andrew Morton
Cc: kbuild-all, Linux Memory Management List, LKML, joaodias, surenb,
cgoldswo, willy, mhocko, david, vbabka
[-- Attachment #1: Type: text/plain, Size: 3841 bytes --]
Hi Minchan,
I love your patch! Yet something to improve:
[auto build test ERROR on block/for-next]
[also build test ERROR on linux/master linus/master v5.12-rc1 next-20210303]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210303-191809
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: arm-randconfig-r031-20210303 (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/a1c74fba81d1258e320ef52bc995cb0333e3e083
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210303-191809
git checkout a1c74fba81d1258e320ef52bc995cb0333e3e083
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
mm/swap.c: In function 'lru_add_drain_cpu':
>> mm/swap.c:671:2: error: implicit declaration of function 'invalidate_bh_lru'; did you mean 'invalidate_bdev'? [-Werror=implicit-function-declaration]
671 | invalidate_bh_lru(NULL);
| ^~~~~~~~~~~~~~~~~
| invalidate_bdev
mm/swap.c: At top level:
mm/swap.c:874:6: error: conflicting types for 'lru_add_drain_all'
874 | void lru_add_drain_all(void)
| ^~~~~~~~~~~~~~~~~
In file included from mm/swap.c:20:
include/linux/swap.h:348:13: note: previous declaration of 'lru_add_drain_all' was here
348 | extern void lru_add_drain_all(bool force_all_cpus);
| ^~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +671 mm/swap.c
634
635 /*
636 * Drain pages out of the cpu's pagevecs.
637 * Either "cpu" is the current CPU, and preemption has already been
638 * disabled; or "cpu" is being hot-unplugged, and is already dead.
639 */
640 void lru_add_drain_cpu(int cpu)
641 {
642 struct pagevec *pvec = &per_cpu(lru_pvecs.lru_add, cpu);
643
644 if (pagevec_count(pvec))
645 __pagevec_lru_add(pvec);
646
647 pvec = &per_cpu(lru_rotate.pvec, cpu);
648 /* Disabling interrupts below acts as a compiler barrier. */
649 if (data_race(pagevec_count(pvec))) {
650 unsigned long flags;
651
652 /* No harm done if a racing interrupt already did this */
653 local_lock_irqsave(&lru_rotate.lock, flags);
654 pagevec_lru_move_fn(pvec, pagevec_move_tail_fn);
655 local_unlock_irqrestore(&lru_rotate.lock, flags);
656 }
657
658 pvec = &per_cpu(lru_pvecs.lru_deactivate_file, cpu);
659 if (pagevec_count(pvec))
660 pagevec_lru_move_fn(pvec, lru_deactivate_file_fn);
661
662 pvec = &per_cpu(lru_pvecs.lru_deactivate, cpu);
663 if (pagevec_count(pvec))
664 pagevec_lru_move_fn(pvec, lru_deactivate_fn);
665
666 pvec = &per_cpu(lru_pvecs.lru_lazyfree, cpu);
667 if (pagevec_count(pvec))
668 pagevec_lru_move_fn(pvec, lru_lazyfree_fn);
669
670 activate_page_drain(cpu);
> 671 invalidate_bh_lru(NULL);
672 }
673
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 20638 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily
2021-03-02 21:09 [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily Minchan Kim
2021-03-02 21:09 ` [PATCH 2/2] mm: fs: Invalidate BH LRU during page migration Minchan Kim
2021-03-03 12:49 ` [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily Michal Hocko
@ 2021-03-03 13:38 ` kernel test robot
2021-03-03 15:11 ` kernel test robot
2021-03-03 18:12 ` kernel test robot
4 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2021-03-03 13:38 UTC (permalink / raw)
To: Minchan Kim, Andrew Morton
Cc: kbuild-all, clang-built-linux, Linux Memory Management List,
LKML, joaodias, surenb, cgoldswo, willy, mhocko, david, vbabka
[-- Attachment #1: Type: text/plain, Size: 15696 bytes --]
Hi Minchan,
I love your patch! Yet something to improve:
[auto build test ERROR on block/for-next]
[also build test ERROR on linux/master linus/master v5.12-rc1 next-20210303]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210303-191809
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: s390-randconfig-r034-20210303 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project a7cad6680b4087eff8994f1f99ac40c661a6621f)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install s390 cross compiling tool for clang build
# apt-get install binutils-s390x-linux-gnu
# https://github.com/0day-ci/linux/commit/6e669beb75caae92c613a012734b1a2dc9485524
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210303-191809
git checkout 6e669beb75caae92c613a012734b1a2dc9485524
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
#define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
^
include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
___constant_swab32(x) : \
^
include/uapi/linux/swab.h:19:12: note: expanded from macro '___constant_swab32'
(((__u32)(x) & (__u32)0x000000ffUL) << 24) | \
^
In file included from arch/s390/kernel/uv.c:14:
In file included from include/linux/memblock.h:14:
In file included from arch/s390/include/asm/dma.h:5:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
^
include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
___constant_swab32(x) : \
^
include/uapi/linux/swab.h:20:12: note: expanded from macro '___constant_swab32'
(((__u32)(x) & (__u32)0x0000ff00UL) << 8) | \
^
In file included from arch/s390/kernel/uv.c:14:
In file included from include/linux/memblock.h:14:
In file included from arch/s390/include/asm/dma.h:5:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
^
include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
___constant_swab32(x) : \
^
include/uapi/linux/swab.h:21:12: note: expanded from macro '___constant_swab32'
(((__u32)(x) & (__u32)0x00ff0000UL) >> 8) | \
^
In file included from arch/s390/kernel/uv.c:14:
In file included from include/linux/memblock.h:14:
In file included from arch/s390/include/asm/dma.h:5:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
^
include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
___constant_swab32(x) : \
^
include/uapi/linux/swab.h:22:12: note: expanded from macro '___constant_swab32'
(((__u32)(x) & (__u32)0xff000000UL) >> 24)))
^
In file included from arch/s390/kernel/uv.c:14:
In file included from include/linux/memblock.h:14:
In file included from arch/s390/include/asm/dma.h:5:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
^
include/uapi/linux/swab.h:120:12: note: expanded from macro '__swab32'
__fswab32(x))
^
In file included from arch/s390/kernel/uv.c:14:
In file included from include/linux/memblock.h:14:
In file included from arch/s390/include/asm/dma.h:5:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writeb(value, PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsb(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsw(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsl(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesb(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesw(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesl(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
>> arch/s390/kernel/uv.c:272:22: error: too few arguments to function call, single argument 'force_all_cpus' was not specified
lru_add_drain_all();
~~~~~~~~~~~~~~~~~ ^
include/linux/swap.h:347:13: note: 'lru_add_drain_all' declared here
extern void lru_add_drain_all(bool force_all_cpus);
^
20 warnings and 1 error generated.
vim +/force_all_cpus +272 arch/s390/kernel/uv.c
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 212
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 213 /*
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 214 * Requests the Ultravisor to make a page accessible to a guest.
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 215 * If it's brought in the first time, it will be cleared. If
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 216 * it has been exported before, it will be decrypted and integrity
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 217 * checked.
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 218 */
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 219 int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 220 {
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 221 struct vm_area_struct *vma;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 222 bool local_drain = false;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 223 spinlock_t *ptelock;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 224 unsigned long uaddr;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 225 struct page *page;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 226 pte_t *ptep;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 227 int rc;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 228
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 229 again:
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 230 rc = -EFAULT;
d8ed45c5dcd455 Michel Lespinasse 2020-06-08 231 mmap_read_lock(gmap->mm);
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 232
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 233 uaddr = __gmap_translate(gmap, gaddr);
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 234 if (IS_ERR_VALUE(uaddr))
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 235 goto out;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 236 vma = find_vma(gmap->mm, uaddr);
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 237 if (!vma)
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 238 goto out;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 239 /*
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 240 * Secure pages cannot be huge and userspace should not combine both.
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 241 * In case userspace does it anyway this will result in an -EFAULT for
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 242 * the unpack. The guest is thus never reaching secure mode. If
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 243 * userspace is playing dirty tricky with mapping huge pages later
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 244 * on this will result in a segmentation fault.
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 245 */
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 246 if (is_vm_hugetlb_page(vma))
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 247 goto out;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 248
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 249 rc = -ENXIO;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 250 page = follow_page(vma, uaddr, FOLL_WRITE);
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 251 if (IS_ERR_OR_NULL(page))
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 252 goto out;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 253
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 254 lock_page(page);
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 255 ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 256 rc = make_secure_pte(ptep, uaddr, page, uvcb);
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 257 pte_unmap_unlock(ptep, ptelock);
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 258 unlock_page(page);
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 259 out:
d8ed45c5dcd455 Michel Lespinasse 2020-06-08 260 mmap_read_unlock(gmap->mm);
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 261
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 262 if (rc == -EAGAIN) {
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 263 wait_on_page_writeback(page);
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 264 } else if (rc == -EBUSY) {
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 265 /*
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 266 * If we have tried a local drain and the page refcount
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 267 * still does not match our expected safe value, try with a
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 268 * system wide drain. This is needed if the pagevecs holding
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 269 * the page are on a different CPU.
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 270 */
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 271 if (local_drain) {
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 @272 lru_add_drain_all();
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 273 /* We give up here, and let the caller try again */
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 274 return -EAGAIN;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 275 }
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 276 /*
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 277 * We are here if the page refcount does not match the
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 278 * expected safe value. The main culprits are usually
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 279 * pagevecs. With lru_add_drain() we drain the pagevecs
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 280 * on the local CPU so that hopefully the refcount will
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 281 * reach the expected safe value.
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 282 */
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 283 lru_add_drain();
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 284 local_drain = true;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 285 /* And now we try again immediately after draining */
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 286 goto again;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 287 } else if (rc == -ENXIO) {
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 288 if (gmap_fault(gmap, gaddr, FAULT_FLAG_WRITE))
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 289 return -EFAULT;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 290 return -EAGAIN;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 291 }
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 292 return rc;
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 293 }
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 294 EXPORT_SYMBOL_GPL(gmap_make_secure);
214d9bbcd3a672 Claudio Imbrenda 2020-01-21 295
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 24383 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily
2021-03-02 21:09 [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily Minchan Kim
` (2 preceding siblings ...)
2021-03-03 13:38 ` kernel test robot
@ 2021-03-03 15:11 ` kernel test robot
2021-03-03 18:12 ` kernel test robot
4 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2021-03-03 15:11 UTC (permalink / raw)
To: Minchan Kim, Andrew Morton
Cc: kbuild-all, Linux Memory Management List, LKML, joaodias, surenb,
cgoldswo, willy, mhocko, david, vbabka
[-- Attachment #1: Type: text/plain, Size: 11653 bytes --]
Hi Minchan,
I love your patch! Yet something to improve:
[auto build test ERROR on block/for-next]
[also build test ERROR on linux/master linus/master v5.12-rc1 next-20210303]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210303-191809
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: nios2-randconfig-r032-20210303 (attached as .config)
compiler: nios2-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/6e669beb75caae92c613a012734b1a2dc9485524
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210303-191809
git checkout 6e669beb75caae92c613a012734b1a2dc9485524
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nios2
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
mm/swap.c:59:6: warning: no previous prototype for 'lru_cache_disabled' [-Wmissing-prototypes]
59 | bool lru_cache_disabled(void)
| ^~~~~~~~~~~~~~~~~~
>> mm/swap.c:871:6: error: conflicting types for 'lru_add_drain_all'
871 | void lru_add_drain_all(void)
| ^~~~~~~~~~~~~~~~~
In file included from mm/swap.c:20:
include/linux/swap.h:347:13: note: previous declaration of 'lru_add_drain_all' was here
347 | extern void lru_add_drain_all(bool force_all_cpus);
| ^~~~~~~~~~~~~~~~~
Kconfig warnings: (for reference only)
WARNING: unmet direct dependencies detected for SERIAL_CORE_CONSOLE
Depends on TTY && HAS_IOMEM
Selected by
- EARLY_PRINTK
vim +/lru_add_drain_all +871 mm/swap.c
053837fce7aa79 Nick Piggin 2006-01-18 768
9852a7212324fd Michal Hocko 2018-01-31 769 /*
9852a7212324fd Michal Hocko 2018-01-31 770 * Doesn't need any cpu hotplug locking because we do rely on per-cpu
9852a7212324fd Michal Hocko 2018-01-31 771 * kworkers being shut down before our page_alloc_cpu_dead callback is
9852a7212324fd Michal Hocko 2018-01-31 772 * executed on the offlined cpu.
9852a7212324fd Michal Hocko 2018-01-31 773 * Calling this function with cpu hotplug locks held can actually lead
9852a7212324fd Michal Hocko 2018-01-31 774 * to obscure indirect dependencies via WQ context.
9852a7212324fd Michal Hocko 2018-01-31 775 */
6e669beb75caae Minchan Kim 2021-03-02 776 void lru_add_drain_all(bool force_all_cpus)
053837fce7aa79 Nick Piggin 2006-01-18 777 {
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 778 /*
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 779 * lru_drain_gen - Global pages generation number
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 780 *
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 781 * (A) Definition: global lru_drain_gen = x implies that all generations
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 782 * 0 < n <= x are already *scheduled* for draining.
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 783 *
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 784 * This is an optimization for the highly-contended use case where a
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 785 * user space workload keeps constantly generating a flow of pages for
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 786 * each CPU.
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 787 */
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 788 static unsigned int lru_drain_gen;
5fbc461636c32e Chris Metcalf 2013-09-12 789 static struct cpumask has_work;
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 790 static DEFINE_MUTEX(lock);
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 791 unsigned cpu, this_gen;
5fbc461636c32e Chris Metcalf 2013-09-12 792
ce612879ddc78e Michal Hocko 2017-04-07 793 /*
ce612879ddc78e Michal Hocko 2017-04-07 794 * Make sure nobody triggers this path before mm_percpu_wq is fully
ce612879ddc78e Michal Hocko 2017-04-07 795 * initialized.
ce612879ddc78e Michal Hocko 2017-04-07 796 */
ce612879ddc78e Michal Hocko 2017-04-07 797 if (WARN_ON(!mm_percpu_wq))
ce612879ddc78e Michal Hocko 2017-04-07 798 return;
ce612879ddc78e Michal Hocko 2017-04-07 799
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 800 /*
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 801 * Guarantee pagevec counter stores visible by this CPU are visible to
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 802 * other CPUs before loading the current drain generation.
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 803 */
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 804 smp_mb();
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 805
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 806 /*
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 807 * (B) Locally cache global LRU draining generation number
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 808 *
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 809 * The read barrier ensures that the counter is loaded before the mutex
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 810 * is taken. It pairs with smp_mb() inside the mutex critical section
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 811 * at (D).
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 812 */
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 813 this_gen = smp_load_acquire(&lru_drain_gen);
eef1a429f234f8 Konstantin Khlebnikov 2019-11-30 814
5fbc461636c32e Chris Metcalf 2013-09-12 815 mutex_lock(&lock);
eef1a429f234f8 Konstantin Khlebnikov 2019-11-30 816
eef1a429f234f8 Konstantin Khlebnikov 2019-11-30 817 /*
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 818 * (C) Exit the draining operation if a newer generation, from another
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 819 * lru_add_drain_all(), was already scheduled for draining. Check (A).
eef1a429f234f8 Konstantin Khlebnikov 2019-11-30 820 */
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 821 if (unlikely(this_gen != lru_drain_gen))
eef1a429f234f8 Konstantin Khlebnikov 2019-11-30 822 goto done;
eef1a429f234f8 Konstantin Khlebnikov 2019-11-30 823
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 824 /*
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 825 * (D) Increment global generation number
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 826 *
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 827 * Pairs with smp_load_acquire() at (B), outside of the critical
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 828 * section. Use a full memory barrier to guarantee that the new global
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 829 * drain generation number is stored before loading pagevec counters.
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 830 *
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 831 * This pairing must be done here, before the for_each_online_cpu loop
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 832 * below which drains the page vectors.
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 833 *
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 834 * Let x, y, and z represent some system CPU numbers, where x < y < z.
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 835 * Assume CPU #z is is in the middle of the for_each_online_cpu loop
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 836 * below and has already reached CPU #y's per-cpu data. CPU #x comes
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 837 * along, adds some pages to its per-cpu vectors, then calls
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 838 * lru_add_drain_all().
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 839 *
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 840 * If the paired barrier is done at any later step, e.g. after the
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 841 * loop, CPU #x will just exit at (C) and miss flushing out all of its
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 842 * added pages.
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 843 */
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 844 WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 845 smp_mb();
eef1a429f234f8 Konstantin Khlebnikov 2019-11-30 846
5fbc461636c32e Chris Metcalf 2013-09-12 847 cpumask_clear(&has_work);
5fbc461636c32e Chris Metcalf 2013-09-12 848 for_each_online_cpu(cpu) {
5fbc461636c32e Chris Metcalf 2013-09-12 849 struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
5fbc461636c32e Chris Metcalf 2013-09-12 850
6e669beb75caae Minchan Kim 2021-03-02 851 if (force_all_cpus ||
6e669beb75caae Minchan Kim 2021-03-02 852 pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
7e0cc01ea181c6 Qian Cai 2020-08-14 853 data_race(pagevec_count(&per_cpu(lru_rotate.pvec, cpu))) ||
b01b2141999936 Ingo Molnar 2020-05-27 854 pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) ||
b01b2141999936 Ingo Molnar 2020-05-27 855 pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) ||
b01b2141999936 Ingo Molnar 2020-05-27 856 pagevec_count(&per_cpu(lru_pvecs.lru_lazyfree, cpu)) ||
5fbc461636c32e Chris Metcalf 2013-09-12 857 need_activate_page_drain(cpu)) {
5fbc461636c32e Chris Metcalf 2013-09-12 858 INIT_WORK(work, lru_add_drain_per_cpu);
ce612879ddc78e Michal Hocko 2017-04-07 859 queue_work_on(cpu, mm_percpu_wq, work);
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 860 __cpumask_set_cpu(cpu, &has_work);
5fbc461636c32e Chris Metcalf 2013-09-12 861 }
5fbc461636c32e Chris Metcalf 2013-09-12 862 }
5fbc461636c32e Chris Metcalf 2013-09-12 863
5fbc461636c32e Chris Metcalf 2013-09-12 864 for_each_cpu(cpu, &has_work)
5fbc461636c32e Chris Metcalf 2013-09-12 865 flush_work(&per_cpu(lru_add_drain_work, cpu));
5fbc461636c32e Chris Metcalf 2013-09-12 866
eef1a429f234f8 Konstantin Khlebnikov 2019-11-30 867 done:
5fbc461636c32e Chris Metcalf 2013-09-12 868 mutex_unlock(&lock);
053837fce7aa79 Nick Piggin 2006-01-18 869 }
6ea183d60c4695 Michal Hocko 2019-02-20 870 #else
6ea183d60c4695 Michal Hocko 2019-02-20 @871 void lru_add_drain_all(void)
6ea183d60c4695 Michal Hocko 2019-02-20 872 {
6ea183d60c4695 Michal Hocko 2019-02-20 873 lru_add_drain();
6ea183d60c4695 Michal Hocko 2019-02-20 874 }
6446a5131e24a8 Ahmed S. Darwish 2020-08-27 875 #endif /* CONFIG_SMP */
053837fce7aa79 Nick Piggin 2006-01-18 876
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 24755 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] mm: fs: Invalidate BH LRU during page migration
2021-03-02 21:09 ` [PATCH 2/2] mm: fs: Invalidate BH LRU during page migration Minchan Kim
2021-03-03 13:38 ` kernel test robot
@ 2021-03-03 15:55 ` kernel test robot
1 sibling, 0 replies; 13+ messages in thread
From: kernel test robot @ 2021-03-03 15:55 UTC (permalink / raw)
To: Minchan Kim, Andrew Morton
Cc: kbuild-all, clang-built-linux, Linux Memory Management List,
LKML, joaodias, surenb, cgoldswo, willy, mhocko, david, vbabka
[-- Attachment #1: Type: text/plain, Size: 11257 bytes --]
Hi Minchan,
I love your patch! Yet something to improve:
[auto build test ERROR on block/for-next]
[also build test ERROR on linux/master linus/master v5.12-rc1 next-20210303]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210303-191809
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: powerpc64-randconfig-r006-20210303 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project a7cad6680b4087eff8994f1f99ac40c661a6621f)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc64 cross compiling tool for clang build
# apt-get install binutils-powerpc64-linux-gnu
# https://github.com/0day-ci/linux/commit/a1c74fba81d1258e320ef52bc995cb0333e3e083
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210303-191809
git checkout a1c74fba81d1258e320ef52bc995cb0333e3e083
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> mm/swap.c:671:2: error: implicit declaration of function 'invalidate_bh_lru' [-Werror,-Wimplicit-function-declaration]
invalidate_bh_lru(NULL);
^
mm/swap.c:671:2: note: did you mean 'invalidate_bdev'?
include/linux/blkdev.h:2021:20: note: 'invalidate_bdev' declared here
static inline void invalidate_bdev(struct block_device *bdev)
^
>> mm/swap.c:860:7: error: implicit declaration of function 'has_bh_in_lru' [-Werror,-Wimplicit-function-declaration]
has_bh_in_lru(cpu, NULL)) {
^
2 errors generated.
vim +/invalidate_bh_lru +671 mm/swap.c
634
635 /*
636 * Drain pages out of the cpu's pagevecs.
637 * Either "cpu" is the current CPU, and preemption has already been
638 * disabled; or "cpu" is being hot-unplugged, and is already dead.
639 */
640 void lru_add_drain_cpu(int cpu)
641 {
642 struct pagevec *pvec = &per_cpu(lru_pvecs.lru_add, cpu);
643
644 if (pagevec_count(pvec))
645 __pagevec_lru_add(pvec);
646
647 pvec = &per_cpu(lru_rotate.pvec, cpu);
648 /* Disabling interrupts below acts as a compiler barrier. */
649 if (data_race(pagevec_count(pvec))) {
650 unsigned long flags;
651
652 /* No harm done if a racing interrupt already did this */
653 local_lock_irqsave(&lru_rotate.lock, flags);
654 pagevec_lru_move_fn(pvec, pagevec_move_tail_fn);
655 local_unlock_irqrestore(&lru_rotate.lock, flags);
656 }
657
658 pvec = &per_cpu(lru_pvecs.lru_deactivate_file, cpu);
659 if (pagevec_count(pvec))
660 pagevec_lru_move_fn(pvec, lru_deactivate_file_fn);
661
662 pvec = &per_cpu(lru_pvecs.lru_deactivate, cpu);
663 if (pagevec_count(pvec))
664 pagevec_lru_move_fn(pvec, lru_deactivate_fn);
665
666 pvec = &per_cpu(lru_pvecs.lru_lazyfree, cpu);
667 if (pagevec_count(pvec))
668 pagevec_lru_move_fn(pvec, lru_lazyfree_fn);
669
670 activate_page_drain(cpu);
> 671 invalidate_bh_lru(NULL);
672 }
673
674 /**
675 * deactivate_file_page - forcefully deactivate a file page
676 * @page: page to deactivate
677 *
678 * This function hints the VM that @page is a good reclaim candidate,
679 * for example if its invalidation fails due to the page being dirty
680 * or under writeback.
681 */
682 void deactivate_file_page(struct page *page)
683 {
684 /*
685 * In a workload with many unevictable page such as mprotect,
686 * unevictable page deactivation for accelerating reclaim is pointless.
687 */
688 if (PageUnevictable(page))
689 return;
690
691 if (likely(get_page_unless_zero(page))) {
692 struct pagevec *pvec;
693
694 local_lock(&lru_pvecs.lock);
695 pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate_file);
696
697 if (pagevec_add_and_need_flush(pvec, page))
698 pagevec_lru_move_fn(pvec, lru_deactivate_file_fn);
699 local_unlock(&lru_pvecs.lock);
700 }
701 }
702
703 /*
704 * deactivate_page - deactivate a page
705 * @page: page to deactivate
706 *
707 * deactivate_page() moves @page to the inactive list if @page was on the active
708 * list and was not an unevictable page. This is done to accelerate the reclaim
709 * of @page.
710 */
711 void deactivate_page(struct page *page)
712 {
713 if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
714 struct pagevec *pvec;
715
716 local_lock(&lru_pvecs.lock);
717 pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate);
718 get_page(page);
719 if (pagevec_add_and_need_flush(pvec, page))
720 pagevec_lru_move_fn(pvec, lru_deactivate_fn);
721 local_unlock(&lru_pvecs.lock);
722 }
723 }
724
725 /**
726 * mark_page_lazyfree - make an anon page lazyfree
727 * @page: page to deactivate
728 *
729 * mark_page_lazyfree() moves @page to the inactive file list.
730 * This is done to accelerate the reclaim of @page.
731 */
732 void mark_page_lazyfree(struct page *page)
733 {
734 if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
735 !PageSwapCache(page) && !PageUnevictable(page)) {
736 struct pagevec *pvec;
737
738 local_lock(&lru_pvecs.lock);
739 pvec = this_cpu_ptr(&lru_pvecs.lru_lazyfree);
740 get_page(page);
741 if (pagevec_add_and_need_flush(pvec, page))
742 pagevec_lru_move_fn(pvec, lru_lazyfree_fn);
743 local_unlock(&lru_pvecs.lock);
744 }
745 }
746
747 void lru_add_drain(void)
748 {
749 local_lock(&lru_pvecs.lock);
750 lru_add_drain_cpu(smp_processor_id());
751 local_unlock(&lru_pvecs.lock);
752 }
753
754 void lru_add_drain_cpu_zone(struct zone *zone)
755 {
756 local_lock(&lru_pvecs.lock);
757 lru_add_drain_cpu(smp_processor_id());
758 drain_local_pages(zone);
759 local_unlock(&lru_pvecs.lock);
760 }
761
762 #ifdef CONFIG_SMP
763
764 static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work);
765
766 static void lru_add_drain_per_cpu(struct work_struct *dummy)
767 {
768 lru_add_drain();
769 }
770
771 /*
772 * Doesn't need any cpu hotplug locking because we do rely on per-cpu
773 * kworkers being shut down before our page_alloc_cpu_dead callback is
774 * executed on the offlined cpu.
775 * Calling this function with cpu hotplug locks held can actually lead
776 * to obscure indirect dependencies via WQ context.
777 */
778 void lru_add_drain_all(bool force_all_cpus)
779 {
780 /*
781 * lru_drain_gen - Global pages generation number
782 *
783 * (A) Definition: global lru_drain_gen = x implies that all generations
784 * 0 < n <= x are already *scheduled* for draining.
785 *
786 * This is an optimization for the highly-contended use case where a
787 * user space workload keeps constantly generating a flow of pages for
788 * each CPU.
789 */
790 static unsigned int lru_drain_gen;
791 static struct cpumask has_work;
792 static DEFINE_MUTEX(lock);
793 unsigned cpu, this_gen;
794
795 /*
796 * Make sure nobody triggers this path before mm_percpu_wq is fully
797 * initialized.
798 */
799 if (WARN_ON(!mm_percpu_wq))
800 return;
801
802 /*
803 * Guarantee pagevec counter stores visible by this CPU are visible to
804 * other CPUs before loading the current drain generation.
805 */
806 smp_mb();
807
808 /*
809 * (B) Locally cache global LRU draining generation number
810 *
811 * The read barrier ensures that the counter is loaded before the mutex
812 * is taken. It pairs with smp_mb() inside the mutex critical section
813 * at (D).
814 */
815 this_gen = smp_load_acquire(&lru_drain_gen);
816
817 mutex_lock(&lock);
818
819 /*
820 * (C) Exit the draining operation if a newer generation, from another
821 * lru_add_drain_all(), was already scheduled for draining. Check (A).
822 */
823 if (unlikely(this_gen != lru_drain_gen))
824 goto done;
825
826 /*
827 * (D) Increment global generation number
828 *
829 * Pairs with smp_load_acquire() at (B), outside of the critical
830 * section. Use a full memory barrier to guarantee that the new global
831 * drain generation number is stored before loading pagevec counters.
832 *
833 * This pairing must be done here, before the for_each_online_cpu loop
834 * below which drains the page vectors.
835 *
836 * Let x, y, and z represent some system CPU numbers, where x < y < z.
837 * Assume CPU #z is is in the middle of the for_each_online_cpu loop
838 * below and has already reached CPU #y's per-cpu data. CPU #x comes
839 * along, adds some pages to its per-cpu vectors, then calls
840 * lru_add_drain_all().
841 *
842 * If the paired barrier is done at any later step, e.g. after the
843 * loop, CPU #x will just exit at (C) and miss flushing out all of its
844 * added pages.
845 */
846 WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
847 smp_mb();
848
849 cpumask_clear(&has_work);
850 for_each_online_cpu(cpu) {
851 struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
852
853 if (force_all_cpus ||
854 pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
855 data_race(pagevec_count(&per_cpu(lru_rotate.pvec, cpu))) ||
856 pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) ||
857 pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) ||
858 pagevec_count(&per_cpu(lru_pvecs.lru_lazyfree, cpu)) ||
859 need_activate_page_drain(cpu) ||
> 860 has_bh_in_lru(cpu, NULL)) {
861 INIT_WORK(work, lru_add_drain_per_cpu);
862 queue_work_on(cpu, mm_percpu_wq, work);
863 __cpumask_set_cpu(cpu, &has_work);
864 }
865 }
866
867 for_each_cpu(cpu, &has_work)
868 flush_work(&per_cpu(lru_add_drain_work, cpu));
869
870 done:
871 mutex_unlock(&lock);
872 }
873 #else
874 void lru_add_drain_all(void)
875 {
876 lru_add_drain();
877 }
878 #endif /* CONFIG_SMP */
879
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 42279 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily
2021-03-02 21:09 [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily Minchan Kim
` (3 preceding siblings ...)
2021-03-03 15:11 ` kernel test robot
@ 2021-03-03 18:12 ` kernel test robot
4 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2021-03-03 18:12 UTC (permalink / raw)
To: Minchan Kim, Andrew Morton
Cc: kbuild-all, clang-built-linux, Linux Memory Management List,
LKML, joaodias, surenb, cgoldswo, willy, mhocko, david, vbabka
[-- Attachment #1: Type: text/plain, Size: 10538 bytes --]
Hi Minchan,
I love your patch! Yet something to improve:
[auto build test ERROR on block/for-next]
[also build test ERROR on linux/master linus/master v5.12-rc1 next-20210303]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210303-191809
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: arm-randconfig-r021-20210303 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project a7cad6680b4087eff8994f1f99ac40c661a6621f)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm cross compiling tool for clang build
# apt-get install binutils-arm-linux-gnueabi
# https://github.com/0day-ci/linux/commit/6e669beb75caae92c613a012734b1a2dc9485524
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Minchan-Kim/mm-disable-LRU-pagevec-during-the-migration-temporarily/20210303-191809
git checkout 6e669beb75caae92c613a012734b1a2dc9485524
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
In file included from fs/iomap/buffered-io.c:19:
>> include/linux/migrate.h:70:41: error: void function 'migrate_prep' should not return a value [-Wreturn-type]
static inline void migrate_prep(void) { return -ENOSYS; }
^ ~~~~~~~
>> include/linux/migrate.h:71:47: error: void function 'migrate_prep_local' should not return a value [-Wreturn-type]
static inline void migrate_prep_local(void) { return -ENOSYS; }
^ ~~~~~~~
2 errors generated.
--
In file included from drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:40:
In file included from drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu.h:86:
In file included from drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu_mn.h:28:
In file included from include/linux/hmm.h:16:
>> include/linux/migrate.h:70:41: error: void function 'migrate_prep' should not return a value [-Wreturn-type]
static inline void migrate_prep(void) { return -ENOSYS; }
^ ~~~~~~~
>> include/linux/migrate.h:71:47: error: void function 'migrate_prep_local' should not return a value [-Wreturn-type]
static inline void migrate_prep_local(void) { return -ENOSYS; }
^ ~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1143:18: warning: result of comparison of constant 4294967296 with expression of type 'resource_size_t' (aka 'unsigned int') is always false [-Wtautological-constant-out-of-range-compare]
res->start > 0x100000000ull)
~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
include/linux/compiler.h:56:47: note: expanded from macro 'if'
#define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
^~~~
include/linux/compiler.h:58:86: note: expanded from macro '__trace_if_var'
#define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
^~~~
include/linux/compiler.h:69:3: note: expanded from macro '__trace_if_value'
(cond) ? \
^~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1143:18: warning: result of comparison of constant 4294967296 with expression of type 'resource_size_t' (aka 'unsigned int') is always false [-Wtautological-constant-out-of-range-compare]
res->start > 0x100000000ull)
~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
include/linux/compiler.h:56:47: note: expanded from macro 'if'
#define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
^~~~
include/linux/compiler.h:58:52: note: expanded from macro '__trace_if_var'
#define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
^~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1143:18: warning: result of comparison of constant 4294967296 with expression of type 'resource_size_t' (aka 'unsigned int') is always false [-Wtautological-constant-out-of-range-compare]
res->start > 0x100000000ull)
~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
include/linux/compiler.h:56:47: note: expanded from macro 'if'
#define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
^~~~
include/linux/compiler.h:58:61: note: expanded from macro '__trace_if_var'
#define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
^~~~
3 warnings and 2 errors generated.
--
In file included from drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:36:
In file included from drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu.h:86:
In file included from drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu_mn.h:28:
In file included from include/linux/hmm.h:16:
>> include/linux/migrate.h:70:41: error: void function 'migrate_prep' should not return a value [-Wreturn-type]
static inline void migrate_prep(void) { return -ENOSYS; }
^ ~~~~~~~
>> include/linux/migrate.h:71:47: error: void function 'migrate_prep_local' should not return a value [-Wreturn-type]
static inline void migrate_prep_local(void) { return -ENOSYS; }
^ ~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:263:7: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat]
version_major, version_minor);
^~~~~~~~~~~~~
include/drm/drm_print.h:498:19: note: expanded from macro 'DRM_ERROR'
__drm_err(fmt, ##__VA_ARGS__)
~~~ ^~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:263:22: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat]
version_major, version_minor);
^~~~~~~~~~~~~
include/drm/drm_print.h:498:19: note: expanded from macro 'DRM_ERROR'
__drm_err(fmt, ##__VA_ARGS__)
~~~ ^~~~~~~~~~~
2 warnings and 2 errors generated.
--
In file included from mm/page_alloc.c:61:
>> include/linux/migrate.h:70:41: error: void function 'migrate_prep' should not return a value [-Wreturn-type]
static inline void migrate_prep(void) { return -ENOSYS; }
^ ~~~~~~~
>> include/linux/migrate.h:71:47: error: void function 'migrate_prep_local' should not return a value [-Wreturn-type]
static inline void migrate_prep_local(void) { return -ENOSYS; }
^ ~~~~~~~
mm/page_alloc.c:2621:5: warning: no previous prototype for function 'find_suitable_fallback' [-Wmissing-prototypes]
int find_suitable_fallback(struct free_area *area, unsigned int order,
^
mm/page_alloc.c:2621:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int find_suitable_fallback(struct free_area *area, unsigned int order,
^
static
mm/page_alloc.c:3600:15: warning: no previous prototype for function 'should_fail_alloc_page' [-Wmissing-prototypes]
noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
^
mm/page_alloc.c:3600:10: note: declare 'static' if the function is not intended to be used outside of this translation unit
noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
^
static
2 warnings and 2 errors generated.
--
In file included from kernel/sched/rt.c:6:
In file included from kernel/sched/sched.h:53:
>> include/linux/migrate.h:70:41: error: void function 'migrate_prep' should not return a value [-Wreturn-type]
static inline void migrate_prep(void) { return -ENOSYS; }
^ ~~~~~~~
>> include/linux/migrate.h:71:47: error: void function 'migrate_prep_local' should not return a value [-Wreturn-type]
static inline void migrate_prep_local(void) { return -ENOSYS; }
^ ~~~~~~~
kernel/sched/rt.c:669:6: warning: no previous prototype for function 'sched_rt_bandwidth_account' [-Wmissing-prototypes]
bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
^
kernel/sched/rt.c:669:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
^
static
1 warning and 2 errors generated.
vim +/migrate_prep +70 include/linux/migrate.h
58
59 static inline void putback_movable_pages(struct list_head *l) {}
60 static inline int migrate_pages(struct list_head *l, new_page_t new,
61 free_page_t free, unsigned long private, enum migrate_mode mode,
62 int reason)
63 { return -ENOSYS; }
64 static inline struct page *alloc_migration_target(struct page *page,
65 unsigned long private)
66 { return NULL; }
67 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
68 { return -EBUSY; }
69
> 70 static inline void migrate_prep(void) { return -ENOSYS; }
> 71 static inline void migrate_prep_local(void) { return -ENOSYS; }
72 static inline void migrate_done(void) {}
73
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32862 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily
2021-03-03 12:49 ` [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily Michal Hocko
@ 2021-03-03 20:23 ` Minchan Kim
2021-03-04 8:07 ` David Hildenbrand
2021-03-05 16:06 ` Michal Hocko
0 siblings, 2 replies; 13+ messages in thread
From: Minchan Kim @ 2021-03-03 20:23 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, linux-mm, LKML, joaodias, surenb, cgoldswo, willy,
david, vbabka, linux-fsdevel
On Wed, Mar 03, 2021 at 01:49:36PM +0100, Michal Hocko wrote:
> On Tue 02-03-21 13:09:48, Minchan Kim wrote:
> > LRU pagevec holds refcount of pages until the pagevec are drained.
> > It could prevent migration since the refcount of the page is greater
> > than the expection in migration logic. To mitigate the issue,
> > callers of migrate_pages drains LRU pagevec via migrate_prep or
> > lru_add_drain_all before migrate_pages call.
> >
> > However, it's not enough because pages coming into pagevec after the
> > draining call still could stay at the pagevec so it could keep
> > preventing page migration. Since some callers of migrate_pages have
> > retrial logic with LRU draining, the page would migrate at next trail
> > but it is still fragile in that it doesn't close the fundamental race
> > between upcoming LRU pages into pagvec and migration so the migration
> > failure could cause contiguous memory allocation failure in the end.
> >
> > To close the race, this patch disables lru caches(i.e, pagevec)
> > during ongoing migration until migrate is done.
> >
> > Since it's really hard to reproduce, I measured how many times
> > migrate_pages retried with force mode below debug code.
> >
> > int migrate_pages(struct list_head *from, new_page_t get_new_page,
> > ..
> > ..
> >
> > if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
> > printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc);
> > dump_page(page, "fail to migrate");
> > }
> >
> > The test was repeating android apps launching with cma allocation
> > in background every five seconds. Total cma allocation count was
> > about 500 during the testing. With this patch, the dump_page count
> > was reduced from 400 to 30.
>
> Have you seen any improvement on the CMA allocation success rate?
Unfortunately, the cma alloc failure rate with reasonable margin
of error is really hard to reproduce under real workload.
That's why I measured the soft metric instead of direct cma fail
under real workload(I don't want to make some adhoc artificial
benchmark and keep tunes system knobs until it could show
extremly exaggerated result to convice patch effect).
Please say if you belive this work is pointless unless there is
stable data under reproducible scenario. I am happy to drop it.
>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> > * from RFC - http://lore.kernel.org/linux-mm/20210216170348.1513483-1-minchan@kernel.org
> > * use atomic and lru_add_drain_all for strict ordering - mhocko
> > * lru_cache_disable/enable - mhocko
> >
> > fs/block_dev.c | 2 +-
> > include/linux/migrate.h | 6 +++--
> > include/linux/swap.h | 4 ++-
> > mm/compaction.c | 4 +--
> > mm/fadvise.c | 2 +-
> > mm/gup.c | 2 +-
> > mm/khugepaged.c | 2 +-
> > mm/ksm.c | 2 +-
> > mm/memcontrol.c | 4 +--
> > mm/memfd.c | 2 +-
> > mm/memory-failure.c | 2 +-
> > mm/memory_hotplug.c | 2 +-
> > mm/mempolicy.c | 6 +++++
> > mm/migrate.c | 15 ++++++-----
> > mm/page_alloc.c | 5 +++-
> > mm/swap.c | 55 +++++++++++++++++++++++++++++++++++------
> > 16 files changed, 85 insertions(+), 30 deletions(-)
>
> The churn seems to be quite big for something that should have been a
> very small change. Have you considered not changing lru_add_drain_all
> but rather introduce __lru_add_dain_all that would implement the
> enforced flushing?
Good idea.
>
> [...]
> > +static atomic_t lru_disable_count = ATOMIC_INIT(0);
> > +
> > +bool lru_cache_disabled(void)
> > +{
> > + return atomic_read(&lru_disable_count);
> > +}
> > +
> > +void lru_cache_disable(void)
> > +{
> > + /*
> > + * lru_add_drain_all's IPI will make sure no new pages are added
> > + * to the pcp lists and drain them all.
> > + */
> > + atomic_inc(&lru_disable_count);
>
> As already mentioned in the last review. The IPI reference is more
> cryptic than useful. I would go with something like this instead
>
> /*
> * lru_add_drain_all in the force mode will schedule draining on
> * all online CPUs so any calls of lru_cache_disabled wrapped by
> * local_lock or preemption disabled would be ordered by that.
> * The atomic operation doesn't need to have stronger ordering
> * requirements because that is enforece by the scheduling
> * guarantees.
> */
Thanks for the nice description.
I will use it in next revision if you believe this work is useful.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily
2021-03-03 20:23 ` Minchan Kim
@ 2021-03-04 8:07 ` David Hildenbrand
2021-03-04 15:55 ` Minchan Kim
2021-03-05 16:06 ` Michal Hocko
1 sibling, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2021-03-04 8:07 UTC (permalink / raw)
To: Minchan Kim, Michal Hocko
Cc: Andrew Morton, linux-mm, LKML, joaodias, surenb, cgoldswo, willy,
vbabka, linux-fsdevel
On 03.03.21 21:23, Minchan Kim wrote:
> On Wed, Mar 03, 2021 at 01:49:36PM +0100, Michal Hocko wrote:
>> On Tue 02-03-21 13:09:48, Minchan Kim wrote:
>>> LRU pagevec holds refcount of pages until the pagevec are drained.
>>> It could prevent migration since the refcount of the page is greater
>>> than the expection in migration logic. To mitigate the issue,
>>> callers of migrate_pages drains LRU pagevec via migrate_prep or
>>> lru_add_drain_all before migrate_pages call.
>>>
>>> However, it's not enough because pages coming into pagevec after the
>>> draining call still could stay at the pagevec so it could keep
>>> preventing page migration. Since some callers of migrate_pages have
>>> retrial logic with LRU draining, the page would migrate at next trail
>>> but it is still fragile in that it doesn't close the fundamental race
>>> between upcoming LRU pages into pagvec and migration so the migration
>>> failure could cause contiguous memory allocation failure in the end.
>>>
>>> To close the race, this patch disables lru caches(i.e, pagevec)
>>> during ongoing migration until migrate is done.
>>>
>>> Since it's really hard to reproduce, I measured how many times
>>> migrate_pages retried with force mode below debug code.
>>>
>>> int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>> ..
>>> ..
>>>
>>> if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
>>> printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc);
>>> dump_page(page, "fail to migrate");
>>> }
>>>
>>> The test was repeating android apps launching with cma allocation
>>> in background every five seconds. Total cma allocation count was
>>> about 500 during the testing. With this patch, the dump_page count
>>> was reduced from 400 to 30.
>>
>> Have you seen any improvement on the CMA allocation success rate?
>
> Unfortunately, the cma alloc failure rate with reasonable margin
> of error is really hard to reproduce under real workload.
> That's why I measured the soft metric instead of direct cma fail
> under real workload(I don't want to make some adhoc artificial
> benchmark and keep tunes system knobs until it could show
> extremly exaggerated result to convice patch effect).
>
> Please say if you belive this work is pointless unless there is
> stable data under reproducible scenario. I am happy to drop it.
Do you have *some* application that triggers such a high retry count?
I'd love to run it along with virtio-mem and report the actual
allocation success rate / necessary retries. That could give an
indication of how helpful your work would be.
Anything that improves the reliability of alloc_contig_range() is of
high interest to me. If it doesn't increase the reliability but merely
does some internal improvements (less retries), it might still be
valuable, but not that important.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily
2021-03-04 8:07 ` David Hildenbrand
@ 2021-03-04 15:55 ` Minchan Kim
0 siblings, 0 replies; 13+ messages in thread
From: Minchan Kim @ 2021-03-04 15:55 UTC (permalink / raw)
To: David Hildenbrand
Cc: Michal Hocko, Andrew Morton, linux-mm, LKML, joaodias, surenb,
cgoldswo, willy, vbabka, linux-fsdevel
On Thu, Mar 04, 2021 at 09:07:28AM +0100, David Hildenbrand wrote:
> On 03.03.21 21:23, Minchan Kim wrote:
> > On Wed, Mar 03, 2021 at 01:49:36PM +0100, Michal Hocko wrote:
> > > On Tue 02-03-21 13:09:48, Minchan Kim wrote:
> > > > LRU pagevec holds refcount of pages until the pagevec are drained.
> > > > It could prevent migration since the refcount of the page is greater
> > > > than the expection in migration logic. To mitigate the issue,
> > > > callers of migrate_pages drains LRU pagevec via migrate_prep or
> > > > lru_add_drain_all before migrate_pages call.
> > > >
> > > > However, it's not enough because pages coming into pagevec after the
> > > > draining call still could stay at the pagevec so it could keep
> > > > preventing page migration. Since some callers of migrate_pages have
> > > > retrial logic with LRU draining, the page would migrate at next trail
> > > > but it is still fragile in that it doesn't close the fundamental race
> > > > between upcoming LRU pages into pagvec and migration so the migration
> > > > failure could cause contiguous memory allocation failure in the end.
> > > >
> > > > To close the race, this patch disables lru caches(i.e, pagevec)
> > > > during ongoing migration until migrate is done.
> > > >
> > > > Since it's really hard to reproduce, I measured how many times
> > > > migrate_pages retried with force mode below debug code.
> > > >
> > > > int migrate_pages(struct list_head *from, new_page_t get_new_page,
> > > > ..
> > > > ..
> > > >
> > > > if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
> > > > printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc);
> > > > dump_page(page, "fail to migrate");
> > > > }
> > > >
> > > > The test was repeating android apps launching with cma allocation
> > > > in background every five seconds. Total cma allocation count was
> > > > about 500 during the testing. With this patch, the dump_page count
> > > > was reduced from 400 to 30.
> > >
> > > Have you seen any improvement on the CMA allocation success rate?
> >
> > Unfortunately, the cma alloc failure rate with reasonable margin
> > of error is really hard to reproduce under real workload.
> > That's why I measured the soft metric instead of direct cma fail
> > under real workload(I don't want to make some adhoc artificial
> > benchmark and keep tunes system knobs until it could show
> > extremly exaggerated result to convice patch effect).
> >
> > Please say if you belive this work is pointless unless there is
> > stable data under reproducible scenario. I am happy to drop it.
>
> Do you have *some* application that triggers such a high retry count?
I have no idea what the specific appliction could trigger the high
retry count since the LRUs(the VM LRU and buffer_head LRU) are
common place everybody could use and every process could trigger.
>
> I'd love to run it along with virtio-mem and report the actual allocation
> success rate / necessary retries. That could give an indication of how
> helpful your work would be.
If it could give stable report, that would be very helpful.
>
> Anything that improves the reliability of alloc_contig_range() is of high
> interest to me. If it doesn't increase the reliability but merely does some
> internal improvements (less retries), it might still be valuable, but not
> that important.
less retrial is good but I'd like to put more effort to close the race
I mentioned completely since the cma allocation failure for our usecases
are critical for user experience.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily
2021-03-03 20:23 ` Minchan Kim
2021-03-04 8:07 ` David Hildenbrand
@ 2021-03-05 16:06 ` Michal Hocko
2021-03-05 20:26 ` Minchan Kim
1 sibling, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2021-03-05 16:06 UTC (permalink / raw)
To: Minchan Kim
Cc: Andrew Morton, linux-mm, LKML, joaodias, surenb, cgoldswo, willy,
david, vbabka, linux-fsdevel
On Wed 03-03-21 12:23:22, Minchan Kim wrote:
> On Wed, Mar 03, 2021 at 01:49:36PM +0100, Michal Hocko wrote:
> > On Tue 02-03-21 13:09:48, Minchan Kim wrote:
> > > LRU pagevec holds refcount of pages until the pagevec are drained.
> > > It could prevent migration since the refcount of the page is greater
> > > than the expection in migration logic. To mitigate the issue,
> > > callers of migrate_pages drains LRU pagevec via migrate_prep or
> > > lru_add_drain_all before migrate_pages call.
> > >
> > > However, it's not enough because pages coming into pagevec after the
> > > draining call still could stay at the pagevec so it could keep
> > > preventing page migration. Since some callers of migrate_pages have
> > > retrial logic with LRU draining, the page would migrate at next trail
> > > but it is still fragile in that it doesn't close the fundamental race
> > > between upcoming LRU pages into pagvec and migration so the migration
> > > failure could cause contiguous memory allocation failure in the end.
> > >
> > > To close the race, this patch disables lru caches(i.e, pagevec)
> > > during ongoing migration until migrate is done.
> > >
> > > Since it's really hard to reproduce, I measured how many times
> > > migrate_pages retried with force mode below debug code.
> > >
> > > int migrate_pages(struct list_head *from, new_page_t get_new_page,
> > > ..
> > > ..
> > >
> > > if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
> > > printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc);
> > > dump_page(page, "fail to migrate");
> > > }
> > >
> > > The test was repeating android apps launching with cma allocation
> > > in background every five seconds. Total cma allocation count was
> > > about 500 during the testing. With this patch, the dump_page count
> > > was reduced from 400 to 30.
> >
> > Have you seen any improvement on the CMA allocation success rate?
>
> Unfortunately, the cma alloc failure rate with reasonable margin
> of error is really hard to reproduce under real workload.
> That's why I measured the soft metric instead of direct cma fail
> under real workload(I don't want to make some adhoc artificial
> benchmark and keep tunes system knobs until it could show
> extremly exaggerated result to convice patch effect).
>
> Please say if you belive this work is pointless unless there is
> stable data under reproducible scenario. I am happy to drop it.
Well, I am not saying that this is pointless. In the end the resulting
change is relatively small and it provides a useful functionality for
other users (e.g. hotplug). That should be a sufficient justification.
I was asking about CMA allocation success rate because that is a much
more reasonable metric than how many times something has retried because
retries can help to increase success rate and the patch doesn't really
remove those. If you want to use number of retries as a metric then the
average allocation latency would be more meaningful.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily
2021-03-05 16:06 ` Michal Hocko
@ 2021-03-05 20:26 ` Minchan Kim
0 siblings, 0 replies; 13+ messages in thread
From: Minchan Kim @ 2021-03-05 20:26 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, linux-mm, LKML, joaodias, surenb, cgoldswo, willy,
david, vbabka, linux-fsdevel
On Fri, Mar 05, 2021 at 05:06:17PM +0100, Michal Hocko wrote:
> On Wed 03-03-21 12:23:22, Minchan Kim wrote:
> > On Wed, Mar 03, 2021 at 01:49:36PM +0100, Michal Hocko wrote:
> > > On Tue 02-03-21 13:09:48, Minchan Kim wrote:
> > > > LRU pagevec holds refcount of pages until the pagevec are drained.
> > > > It could prevent migration since the refcount of the page is greater
> > > > than the expection in migration logic. To mitigate the issue,
> > > > callers of migrate_pages drains LRU pagevec via migrate_prep or
> > > > lru_add_drain_all before migrate_pages call.
> > > >
> > > > However, it's not enough because pages coming into pagevec after the
> > > > draining call still could stay at the pagevec so it could keep
> > > > preventing page migration. Since some callers of migrate_pages have
> > > > retrial logic with LRU draining, the page would migrate at next trail
> > > > but it is still fragile in that it doesn't close the fundamental race
> > > > between upcoming LRU pages into pagvec and migration so the migration
> > > > failure could cause contiguous memory allocation failure in the end.
> > > >
> > > > To close the race, this patch disables lru caches(i.e, pagevec)
> > > > during ongoing migration until migrate is done.
> > > >
> > > > Since it's really hard to reproduce, I measured how many times
> > > > migrate_pages retried with force mode below debug code.
> > > >
> > > > int migrate_pages(struct list_head *from, new_page_t get_new_page,
> > > > ..
> > > > ..
> > > >
> > > > if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
> > > > printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc);
> > > > dump_page(page, "fail to migrate");
> > > > }
> > > >
> > > > The test was repeating android apps launching with cma allocation
> > > > in background every five seconds. Total cma allocation count was
> > > > about 500 during the testing. With this patch, the dump_page count
> > > > was reduced from 400 to 30.
> > >
> > > Have you seen any improvement on the CMA allocation success rate?
> >
> > Unfortunately, the cma alloc failure rate with reasonable margin
> > of error is really hard to reproduce under real workload.
> > That's why I measured the soft metric instead of direct cma fail
> > under real workload(I don't want to make some adhoc artificial
> > benchmark and keep tunes system knobs until it could show
> > extremly exaggerated result to convice patch effect).
> >
> > Please say if you belive this work is pointless unless there is
> > stable data under reproducible scenario. I am happy to drop it.
>
> Well, I am not saying that this is pointless. In the end the resulting
> change is relatively small and it provides a useful functionality for
> other users (e.g. hotplug). That should be a sufficient justification.
Yub, that was my impression to worth upstreaming rather than keeping
downstream tree so made divergent.
>
> I was asking about CMA allocation success rate because that is a much
> more reasonable metric than how many times something has retried because
> retries can help to increase success rate and the patch doesn't really
> remove those. If you want to use number of retries as a metric then the
> average allocation latency would be more meaningful.
I believe the allocation latency would be pretty big and retrial part
would be marginal so doubt it's meaningful.
Let me send next revision with as-is descripion once I fix places
you pointed out.
Thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-03-05 20:26 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-02 21:09 [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily Minchan Kim
2021-03-02 21:09 ` [PATCH 2/2] mm: fs: Invalidate BH LRU during page migration Minchan Kim
2021-03-03 13:38 ` kernel test robot
2021-03-03 15:55 ` kernel test robot
2021-03-03 12:49 ` [PATCH 1/2] mm: disable LRU pagevec during the migration temporarily Michal Hocko
2021-03-03 20:23 ` Minchan Kim
2021-03-04 8:07 ` David Hildenbrand
2021-03-04 15:55 ` Minchan Kim
2021-03-05 16:06 ` Michal Hocko
2021-03-05 20:26 ` Minchan Kim
2021-03-03 13:38 ` kernel test robot
2021-03-03 15:11 ` kernel test robot
2021-03-03 18:12 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).