All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boqun Feng <boqun.feng@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>,
	tglx@linutronix.de, linux-kernel@vger.kernel.org,
	Ingo Molnar <mingo@kernel.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Waiman Long <longman@redhat.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Mike Galbraith <efault@gmx.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>
Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
Date: Sun, 12 Sep 2021 11:57:22 +0800	[thread overview]
Message-ID: <YT16ognizWI6xROs@boqun-archlinux> (raw)
In-Reply-To: <YToZ4h/nfsrD3JfY@hirez.programming.kicks-ass.net>

On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 09, 2021 at 02:45:24PM +0100, Will Deacon wrote:
> > On Thu, Sep 09, 2021 at 12:59:16PM +0200, Peter Zijlstra wrote:
> > > While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> > > it really ought to use smp_store_mb(), because something like:
> > > 
> > > 	current_save_and_set_rtlock_wait_state();
> > > 	for (;;) {
> > > 		if (try_lock())
> > > 			break;
> > > 
> > > 		raw_spin_unlock_irq(&lock->wait_lock);
> > > 		schedule();
> > > 		raw_spin_lock_irq(&lock->wait_lock);
> > > 
> > > 		set_current_state(TASK_RTLOCK_WAIT);
> > > 	}
> > > 	current_restore_rtlock_saved_state();
> > > 
> > > which is the advertised usage in the comment, is actually broken,
> > > since trylock() will only need a load-acquire in general and that
> > > could be re-ordered against the state store, which could lead to a
> > > missed wakeup -> BAD (tm).
> > 
> > Why doesn't the UNLOCK of pi_lock in current_save_and_set_rtlock_wait_state()
> > order the state change before the successful try_lock? I'm just struggling
> > to envisage how this actually goes wrong.
> 
> Moo yes, so the earlier changelog I wrote was something like:
> 
> 	current_save_and_set_rtlock_wait_state();
> 	for (;;) {
> 		if (try_lock())
> 			break;
> 
> 		raw_spin_unlock_irq(&lock->wait_lock);
> 		if (!cond)
> 			schedule();
> 		raw_spin_lock_irq(&lock->wait_lock);
> 
> 		set_current_state(TASK_RTLOCK_WAIT);
> 	}
> 	current_restore_rtlock_saved_state();
> 
> which is more what the code looks like before these patches, and in that
> case the @cond load can be lifted before __state.
> 
> It all sorta works in the current application because most things are
> serialized by ->wait_lock, but given the 'normal' wait pattern I got
> highly suspicious of there not being a full barrier around.

Hmm.. I think ->pi_lock actually protects us here. IIUC, a mising
wake-up would happen if try_to_wake_up() failed to observe the __state
change by the about-to-wait task, and the about-to-wait task didn't
observe the condition set by the waker task, for example:

	TASK 0				TASK 1
	======				======
					cond = 1;
					...
					try_to_wake_up(t0, TASK_RTLOCK_WAIT, ..):
					  ttwu_state_match(...)
					    if (t0->__state & TASK_RTLOCK_WAIT) // false
					      ..
					    return false; // don't wake up
	...
	current->__state = TASK_RTLOCK_WAIT
	...
	if (!cond) // !cond is true because of memory reordering
	  schedule(); // sleep, and may not be waken up again.

But let's add ->pi_lock critical sections into the example:

	TASK 0				TASK 1
	======				======
					cond = 1;
					...
					try_to_wake_up(t0, TASK_RTLOCK_WAIT, ..):
					  raw_spin_lock_irqsave(->pi_lock,...);
					  ttwu_state_match(...)
					    if (t0->__state & TASK_RTLOCK_WAIT) // false
					      ..
					    return false; // don't wake up
					  raw_spin_unlock_irqrestore(->pi_lock,...); // A
	...
	raw_spin_lock_irqsave(->pi_lock, ...); // B
	current->__state = TASK_RTLOCK_WAIT
	raw_spin_unlock_irqrestore(->pi_lock, ...);
	if (!cond)
	  schedule();

Now the read of cond on TASK0 must observe the store of cond on TASK1,
because accesses to __state is serialized by ->pi_lock, so if TASK1's
read to __state didn't observe the write of TASK0 to __state, then the
lock B must read from the unlock A (or another unlock co-after A),
then we have a release-acquire pair to guarantee that the read of cond
on TASK0 sees the write of cond on TASK1. Simplify this by a litmus
test below:

	C unlock-lock
	{
	}

	P0(spinlock_t *s, int *cond, int *state)
	{
		int r1;

		spin_lock(s);
		WRITE_ONCE(*state, 1);
		spin_unlock(s);
		r1 = READ_ONCE(*cond);
	}

	P1(spinlock_t *s, int *cond, int *state)
	{
		int r1;

		WRITE_ONCE(*cond, 1);
		spin_lock(s);
		r1 = READ_ONCE(*state);
		spin_unlock(s);
	}

	exists (0:r1=0 /\ 1:r1=0)

and result is:

	Test unlock-lock Allowed
	States 3
	0:r1=0; 1:r1=1;
	0:r1=1; 1:r1=0;
	0:r1=1; 1:r1=1;
	No
	Witnesses
	Positive: 0 Negative: 3
	Condition exists (0:r1=0 /\ 1:r1=0)
	Observation unlock-lock Never 0 3
	Time unlock-lock 0.01
	Hash=e1f914505f07e380405f65d3b0fb6940

In short, since we write to the __state with ->pi_lock held, I don't
think we need to smp_store_mb() for __state. But maybe I'm missing
something subtle here ;-)

Regards,
Boqun

  parent reply	other threads:[~2021-09-12  4:01 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-09 10:59 [PATCH 0/4] locking/rwbase: Assorted fixes Peter Zijlstra
2021-09-09 10:59 ` [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state() Peter Zijlstra
2021-09-09 13:45   ` Will Deacon
2021-09-09 14:27     ` Peter Zijlstra
2021-09-10 12:57       ` Will Deacon
2021-09-10 13:17         ` Peter Zijlstra
2021-09-10 14:01           ` Peter Zijlstra
2021-09-10 15:06             ` Will Deacon
2021-09-10 16:07             ` Waiman Long
2021-09-10 17:09               ` Peter Zijlstra
2021-09-12  3:57       ` Boqun Feng [this message]
2021-09-10 12:45   ` Sebastian Andrzej Siewior
2021-09-13 22:08   ` Thomas Gleixner
2021-09-13 22:52     ` Thomas Gleixner
2021-09-14  6:45       ` Peter Zijlstra
2021-09-09 10:59 ` [PATCH 2/4] locking/rwbase: Properly match set_and_save_state() to restore_state() Peter Zijlstra
2021-09-09 13:53   ` Will Deacon
2021-09-14  7:31   ` Thomas Gleixner
2021-09-16 11:59   ` [tip: locking/urgent] " tip-bot2 for Peter Zijlstra
2021-09-09 10:59 ` [PATCH 3/4] locking/rwbase: Fix rwbase_write_lock() vs __rwbase_read_lock() Peter Zijlstra
2021-09-14  7:45   ` Thomas Gleixner
2021-09-14 13:59     ` Peter Zijlstra
2021-09-14 15:00       ` Thomas Gleixner
2021-09-16 11:59       ` [tip: locking/urgent] locking/rwbase: Extract __rwbase_write_trylock() tip-bot2 for Peter Zijlstra
2021-09-09 10:59 ` [PATCH 4/4] locking/rwbase: Take care of ordering guarantee for fastpath reader Peter Zijlstra
2021-09-14  7:46   ` Thomas Gleixner
2021-09-16 11:59   ` [tip: locking/urgent] " tip-bot2 for Boqun Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YT16ognizWI6xROs@boqun-archlinux \
    --to=boqun.feng@gmail.com \
    --cc=bigeasy@linutronix.de \
    --cc=bristot@redhat.com \
    --cc=dave@stgolabs.net \
    --cc=efault@gmx.de \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.