* [patch 029/158] mm/swap.c: piggyback lru_add_drain_all() calls
@ 2019-12-01 1:50 akpm
0 siblings, 0 replies; only message in thread
From: akpm @ 2019-12-01 1:50 UTC (permalink / raw)
To: akpm, khlebnikov, linux-mm, mhocko, mm-commits, torvalds, willy
From: Konstantin Khlebnikov <email@example.com>
Subject: mm/swap.c: piggyback lru_add_drain_all() calls
This is a very slow operation. Right now POSIX_FADV_DONTNEED is the top
user because it has to freeze page references when removing it from the
cache. invalidate_bdev() calls it for the same reason. Both are
triggered from userspace, so it's easy to generate a storm.
mlock/mlockall no longer calls lru_add_drain_all - I've seen here
serious slowdown on older kernels.
There are some less obvious paths in memory migration/CMA/offlining which
shouldn't call frequently.
The worst case requires a non-trivial workload because lru_add_drain_all()
skips cpus where vectors are empty. Something must constantly generate a
flow of pages for each cpu. Also cpus must be busy to make scheduling
per-cpu works slower. And the machine must be big enough (64+ cpus in our
In our case that was a massive series of mlock calls in map-reduce while
other tasks write logs (and generates flows of new pages in per-cpu
vectors). Mlock calls were serialized by mutex and accumulated latency up
to 10 seconds or more.
The kernel does not call lru_add_drain_all on mlock paths since 4.15, but
the same scenario could be triggered by fadvise(POSIX_FADV_DONTNEED) or
any other remaining user.
There is no reason to do the drain again if somebody else already drained
all the per-cpu vectors while we waited for the lock.
Piggyback on a drain starting and finishing while we wait for the lock:
all pages pending at the time of our entry were drained from the vectors.
Callers like POSIX_FADV_DONTNEED retry their operations once after
draining per-cpu vectors when pages have unexpected references.
Signed-off-by: Konstantin Khlebnikov <firstname.lastname@example.org>
Reviewed-by: Andrew Morton <email@example.com>
Cc: Michal Hocko <firstname.lastname@example.org>
Cc: Matthew Wilcox <email@example.com>
Signed-off-by: Andrew Morton <firstname.lastname@example.org>
mm/swap.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
@@ -713,9 +713,10 @@ static void lru_add_drain_per_cpu(struct
+ static seqcount_t seqcount = SEQCNT_ZERO(seqcount);
static struct cpumask has_work;
- int cpu;
+ int cpu, seq;
* Make sure nobody triggers this path before mm_percpu_wq is fully
@@ -724,7 +725,19 @@ void lru_add_drain_all(void)
+ seq = raw_read_seqcount_latch(&seqcount);
+ * Piggyback on drain started and finished while we waited for lock:
+ * all pages pended at the time of our enter were drained from vectors.
+ if (__read_seqcount_retry(&seqcount, seq))
+ goto done;
@@ -745,6 +758,7 @@ void lru_add_drain_all(void)
^ permalink raw reply [flat|nested] only message in thread
only message in thread, back to index
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-01 1:50 [patch 029/158] mm/swap.c: piggyback lru_add_drain_all() calls akpm
Linux-mm Archive on lore.kernel.org
Archives are clonable:
git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
Example config snippet for mirrors
Newsgroup available over NNTP:
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git