Linux-mm Archive on
 help / color / Atom feed
* [patch 029/158] mm/swap.c: piggyback lru_add_drain_all() calls
@ 2019-12-01  1:50 akpm
  0 siblings, 0 replies; only message in thread
From: akpm @ 2019-12-01  1:50 UTC (permalink / raw)
  To: akpm, khlebnikov, linux-mm, mhocko, mm-commits, torvalds, willy

From: Konstantin Khlebnikov <>
Subject: mm/swap.c: piggyback lru_add_drain_all() calls

This is a very slow operation.  Right now POSIX_FADV_DONTNEED is the top
user because it has to freeze page references when removing it from the
cache.  invalidate_bdev() calls it for the same reason.  Both are
triggered from userspace, so it's easy to generate a storm.

mlock/mlockall no longer calls lru_add_drain_all - I've seen here
serious slowdown on older kernels.

There are some less obvious paths in memory migration/CMA/offlining which
shouldn't call frequently.

The worst case requires a non-trivial workload because lru_add_drain_all()
skips cpus where vectors are empty.  Something must constantly generate a
flow of pages for each cpu.  Also cpus must be busy to make scheduling
per-cpu works slower.  And the machine must be big enough (64+ cpus in our

In our case that was a massive series of mlock calls in map-reduce while
other tasks write logs (and generates flows of new pages in per-cpu
vectors).  Mlock calls were serialized by mutex and accumulated latency up
to 10 seconds or more.

The kernel does not call lru_add_drain_all on mlock paths since 4.15, but
the same scenario could be triggered by fadvise(POSIX_FADV_DONTNEED) or
any other remaining user.

There is no reason to do the drain again if somebody else already drained
all the per-cpu vectors while we waited for the lock.

Piggyback on a drain starting and finishing while we wait for the lock:
all pages pending at the time of our entry were drained from the vectors.

Callers like POSIX_FADV_DONTNEED retry their operations once after
draining per-cpu vectors when pages have unexpected references.

Signed-off-by: Konstantin Khlebnikov <>
Reviewed-by: Andrew Morton <>
Cc: Michal Hocko <>
Cc: Matthew Wilcox <>
Signed-off-by: Andrew Morton <>

 mm/swap.c |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

--- a/mm/swap.c~mm-swap-piggyback-lru_add_drain_all-calls
+++ a/mm/swap.c
@@ -713,9 +713,10 @@ static void lru_add_drain_per_cpu(struct
 void lru_add_drain_all(void)
+	static seqcount_t seqcount = SEQCNT_ZERO(seqcount);
 	static DEFINE_MUTEX(lock);
 	static struct cpumask has_work;
-	int cpu;
+	int cpu, seq;
 	 * Make sure nobody triggers this path before mm_percpu_wq is fully
@@ -724,7 +725,19 @@ void lru_add_drain_all(void)
 	if (WARN_ON(!mm_percpu_wq))
+	seq = raw_read_seqcount_latch(&seqcount);
+	/*
+	 * Piggyback on drain started and finished while we waited for lock:
+	 * all pages pended at the time of our enter were drained from vectors.
+	 */
+	if (__read_seqcount_retry(&seqcount, seq))
+		goto done;
+	raw_write_seqcount_latch(&seqcount);
 	for_each_online_cpu(cpu) {
@@ -745,6 +758,7 @@ void lru_add_drain_all(void)
 	for_each_cpu(cpu, &has_work)
 		flush_work(&per_cpu(lru_add_drain_work, cpu));

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, back to index

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-01  1:50 [patch 029/158] mm/swap.c: piggyback lru_add_drain_all() calls akpm

Linux-mm Archive on

Archives are clonable:
	git clone --mirror linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ \
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone