Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Ahmed S. Darwish" <a.darwish@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,  Will Deacon <will@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	"Sebastian A. Siewior" <bigeasy@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	 linux-mm@kvack.org
Subject: Re: [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API
Date: Mon, 25 May 2020 18:10:40 +0200
Message-ID: <87y2pg9erj.fsf@vostro.fn.ogness.net> (raw)
In-Reply-To: <20200522145707.GO325280@hirez.programming.kicks-ass.net> (Peter Zijlstra's message of "Fri, 22 May 2020 16:57:07 +0200")


[-- Attachment #1: Type: text/plain, Size: 2799 bytes --]

Hi,

This optimization is broken. The main concern here: Is it possible that
lru_add_drain_all() _would_ have drained pagevec X, but then aborted
because another lru_add_drain_all() is underway and that other task will
_not_ drain pagevec X? I claim the answer is yes!

My suggested changes are inline below.

I attached a litmus test to verify it.

On 2020-05-22, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, May 19, 2020 at 11:45:24PM +0200, Ahmed S. Darwish wrote:
>> @@ -713,10 +713,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
>>   */
>>  void lru_add_drain_all(void)
>>  {
>
>> +	static unsigned int lru_drain_gen;
>>  	static struct cpumask has_work;
>> +	static DEFINE_MUTEX(lock);
>> +	int cpu, this_gen;
>>  
>>  	/*
>>  	 * Make sure nobody triggers this path before mm_percpu_wq is fully
>> @@ -725,21 +735,48 @@ void lru_add_drain_all(void)
>>  	if (WARN_ON(!mm_percpu_wq))
>>  		return;
>>  

An smp_mb() is needed here.

	/*
	 * Guarantee the pagevec counter stores visible by
	 * this CPU are visible to other CPUs before loading
	 * the current drain generation.
	 */
	smp_mb();

>> +	this_gen = READ_ONCE(lru_drain_gen);
>> +	smp_rmb();
>
> 	this_gen = smp_load_acquire(&lru_drain_gen);
>>  
>>  	mutex_lock(&lock);
>>  
>>  	/*
>> +	 * (C) Exit the draining operation if a newer generation, from another
>> +	 * lru_add_drain_all(), was already scheduled for draining. Check (A).
>>  	 */
>> +	if (unlikely(this_gen != lru_drain_gen))
>>  		goto done;
>>  
>
>> +	WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
>> +	smp_wmb();

Instead of smp_wmb(), this needs to be a full memory barrier.

	/*
	 * Guarantee the new drain generation is stored before
	 * loading the pagevec counters.
	 */
	smp_mb();

> You can leave this smp_wmb() out and rely on the smp_mb() implied by
> queue_work_on()'s test_and_set_bit().
>
>>  	cpumask_clear(&has_work);
>> -
>>  	for_each_online_cpu(cpu) {
>>  		struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
>>  
>
> While you're here, do:
>
> 	s/cpumask_set_cpu/__&/
>
>> @@ -766,7 +803,7 @@ void lru_add_drain_all(void)
>>  {
>>  	lru_add_drain();
>>  }
>> -#endif
>> +#endif /* CONFIG_SMP */
>>  
>>  /**
>>   * release_pages - batched put_page()

For the litmus test:

1:rx=0             (P1 did not see the pagevec counter)
2:rx=1             (P2 _would_ have seen the pagevec counter)
2:ry1=0 /\ 2:ry2=1 (P2 aborted due to optimization)

Changing the smp_mb() back to smp_wmb() in P1 and removing the smp_mb()
in P2 represents this patch. And it shows that sometimes P2 will abort
even though it would have drained the pagevec and P1 did not drain the
pagevec.

This is ugly as hell. And there maybe other memory barrier types to make
it pretty. But as is, memory barriers are missing.

John Ogness


[-- Attachment #2: lru_add_drain_all.litmus --]
[-- Type: text/plain, Size: 1069 bytes --]

C lru_add_drain_all

(*
 * x is a pagevec counter
 * y is @lru_drain_gen
 * z is @lock
 *)

{ }

P0(int *x)
{
	// mark pagevec for draining
	WRITE_ONCE(*x, 1);
}

P1(int *x, int *y, int *z)
{
	int rx;
	int rz;

	// mutex_lock(&lock);
	rz = cmpxchg_acquire(z, 0, 1);
	if (rz == 0) {

		// WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
		WRITE_ONCE(*y, 1);

		// guarantee lru_drain_gen store before loading pagevec
		smp_mb();

		// if (pagevec_count(...))
		rx = READ_ONCE(*x);

		// mutex_unlock(&lock);
		rz = cmpxchg_release(z, 1, 2);
	}
}

P2(int *x, int *y, int *z)
{
	int rx;
	int ry1;
	int ry2;
	int rz;

	// the pagevec counter as visible now to this CPU
	rx = READ_ONCE(*x);

	// guarantee pagevec store before loading lru_drain_gen
	smp_mb();

	// this_gen = READ_ONCE(lru_drain_gen); smp_rmb();
	ry1 = smp_load_acquire(y);

	// mutex_lock(&lock) - acquired after P1
	rz = cmpxchg_acquire(z, 2, 3);
	if (rz == 2) {

		// if (unlikely(this_gen != lru_drain_gen))
		ry2 = READ_ONCE(*y);
	}
}

locations [x; y; z]
exists (1:rx=0 /\ 2:rx=1 /\ 2:ry1=0 /\ 2:ry2=1)

  parent reply index

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API Ahmed S. Darwish
2020-05-20 11:24   ` Hillf Danton
2020-05-20 12:22   ` Konstantin Khlebnikov
2020-05-20 13:05     ` Peter Zijlstra
2020-05-22 14:57   ` Peter Zijlstra
2020-05-22 15:17     ` Sebastian A. Siewior
2020-05-22 16:23       ` Peter Zijlstra
2020-05-25 15:24     ` Ahmed S. Darwish
2020-05-25 15:45       ` Peter Zijlstra
2020-05-25 16:10     ` John Ogness [this message]
     [not found] ` <20200827114044.11173-1-a.darwish@linutronix.de>
2020-08-27 11:40   ` [PATCH v1 2/8] mm/swap: Do not abuse the seqcount_t " Ahmed S. Darwish

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y2pg9erj.fsf@vostro.fn.ogness.net \
    --to=john.ogness@linutronix.de \
    --cc=a.darwish@linutronix.de \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git