From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
To: "Ahmed S. Darwish" <a.darwish@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
"Paul E. McKenney" <paulmck@kernel.org>,
"Sebastian A. Siewior" <bigeasy@linutronix.de>,
Steven Rostedt <rostedt@goodmis.org>,
LKML <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org
Subject: Re: [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API
Date: Wed, 20 May 2020 15:22:15 +0300 [thread overview]
Message-ID: <706c697c-c951-e0c3-40f8-f6e429b2226a@yandex-team.ru> (raw)
In-Reply-To: <20200519214547.352050-3-a.darwish@linutronix.de>
On 20/05/2020 00.45, Ahmed S. Darwish wrote:
> Commit eef1a429f234 ("mm/swap.c: piggyback lru_add_drain_all() calls")
> implemented an optimization mechanism to exit the to-be-started LRU
> drain operation (name it A) if another drain operation *started and
> finished* while (A) was blocked on the LRU draining mutex.
>
> This was done through a seqcount latch, which is an abuse of its
> semantics:
>
> 1. Seqcount latching should be used for the purpose of switching
> between two storage places with sequence protection to allow
> interruptible, preemptible writer sections. The optimization
> mechanism has absolutely nothing to do with that.
>
> 2. The used raw_write_seqcount_latch() has two smp write memory
> barriers to always insure one consistent storage place out of the
> two storage places available. This extra smp_wmb() is redundant for
> the optimization use case.
>
> Beside the API abuse, the semantics of a latch sequence counter was
> force fitted into the optimization. What was actually meant is to track
> generations of LRU draining operations, where "current lru draining
> generation = x" implies that all generations 0 < n <= x are already
> *scheduled* for draining.
>
> Remove the conceptually-inappropriate seqcount latch usage and manually
> implement the optimization using a counter and SMP memory barriers.
Well, I thought it fits perfectly =)
Maybe it's worth to add helpers with appropriate semantics?
This is pretty common pattern.
>
> Link: https://lkml.kernel.org/r/CALYGNiPSr-cxV9MX9czaVh6Wz_gzSv3H_8KPvgjBTGbJywUJpA@mail.gmail.com
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> ---
> mm/swap.c | 57 +++++++++++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 47 insertions(+), 10 deletions(-)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index bf9a79fed62d..d6910eeed43d 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -713,10 +713,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
> */
> void lru_add_drain_all(void)
> {
> - static seqcount_t seqcount = SEQCNT_ZERO(seqcount);
> - static DEFINE_MUTEX(lock);
> + /*
> + * lru_drain_gen - Current generation of pages that could be in vectors
> + *
> + * (A) Definition: lru_drain_gen = x implies that all generations
> + * 0 < n <= x are already scheduled for draining.
> + *
> + * This is an optimization for the highly-contended use case where a
> + * user space workload keeps constantly generating a flow of pages
> + * for each CPU.
> + */
> + static unsigned int lru_drain_gen;
> static struct cpumask has_work;
> - int cpu, seq;
> + static DEFINE_MUTEX(lock);
> + int cpu, this_gen;
>
> /*
> * Make sure nobody triggers this path before mm_percpu_wq is fully
> @@ -725,21 +735,48 @@ void lru_add_drain_all(void)
> if (WARN_ON(!mm_percpu_wq))
> return;
>
> - seq = raw_read_seqcount_latch(&seqcount);
> + /*
> + * (B) Cache the LRU draining generation number
> + *
> + * smp_rmb() ensures that the counter is loaded before the mutex is
> + * taken. It pairs with the smp_wmb() inside the mutex critical section
> + * at (D).
> + */
> + this_gen = READ_ONCE(lru_drain_gen);
> + smp_rmb();
>
> mutex_lock(&lock);
>
> /*
> - * Piggyback on drain started and finished while we waited for lock:
> - * all pages pended at the time of our enter were drained from vectors.
> + * (C) Exit the draining operation if a newer generation, from another
> + * lru_add_drain_all(), was already scheduled for draining. Check (A).
> */
> - if (__read_seqcount_retry(&seqcount, seq))
> + if (unlikely(this_gen != lru_drain_gen))
> goto done;
>
> - raw_write_seqcount_latch(&seqcount);
> + /*
> + * (D) Increment generation number
> + *
> + * Pairs with READ_ONCE() and smp_rmb() at (B), outside of the critical
> + * section.
> + *
> + * This pairing must be done here, before the for_each_online_cpu loop
> + * below which drains the page vectors.
> + *
> + * Let x, y, and z represent some system CPU numbers, where x < y < z.
> + * Assume CPU #z is is in the middle of the for_each_online_cpu loop
> + * below and has already reached CPU #y's per-cpu data. CPU #x comes
> + * along, adds some pages to its per-cpu vectors, then calls
> + * lru_add_drain_all().
> + *
> + * If the paired smp_wmb() below is done at any later step, e.g. after
> + * the loop, CPU #x will just exit at (C) and miss flushing out all of
> + * its added pages.
> + */
> + WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
> + smp_wmb();
>
> cpumask_clear(&has_work);
> -
> for_each_online_cpu(cpu) {
> struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
>
> @@ -766,7 +803,7 @@ void lru_add_drain_all(void)
> {
> lru_add_drain();
> }
> -#endif
> +#endif /* CONFIG_SMP */
>
> /**
> * release_pages - batched put_page()
>
next prev parent reply other threads:[~2020-05-20 12:22 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API Ahmed S. Darwish
2020-05-20 11:24 ` Hillf Danton
2020-05-20 12:22 ` Konstantin Khlebnikov [this message]
2020-05-20 13:05 ` Peter Zijlstra
2020-05-22 14:57 ` Peter Zijlstra
2020-05-22 15:17 ` Sebastian A. Siewior
2020-05-22 16:23 ` Peter Zijlstra
2020-05-25 15:24 ` Ahmed S. Darwish
2020-05-25 15:45 ` Peter Zijlstra
2020-05-25 16:10 ` John Ogness
[not found] ` <20200827114044.11173-1-a.darwish@linutronix.de>
2020-08-27 11:40 ` [PATCH v1 2/8] mm/swap: Do not abuse the seqcount_t " Ahmed S. Darwish
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=706c697c-c951-e0c3-40f8-f6e429b2226a@yandex-team.ru \
--to=khlebnikov@yandex-team.ru \
--cc=a.darwish@linutronix.de \
--cc=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).