All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Will Deacon <will@kernel.org>, Jan Kara <jack@suse.cz>,
	Waiman Long <longman@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>
Subject: Re: Crash with PREEMPT_RT on aarch64 machine
Date: Wed, 30 Nov 2022 20:22:04 +0000	[thread overview]
Message-ID: <20221130202204.usku3rl6wowiugju@suse.de> (raw)
In-Reply-To: <Y4Tapja2qq8HiHBZ@linutronix.de>

On Mon, Nov 28, 2022 at 04:58:30PM +0100, Sebastian Andrzej Siewior wrote:
> How about this?
> 
> - The fast path is easy???
> 
> - The slow path first sets the WAITER bits (mark_rt_mutex_waiters()) so
>   I made that one _acquire so that it is visible by the unlocker forcing
>   everyone into slow path.
> 
> - If the lock is acquired, then the owner is written via
>   rt_mutex_set_owner(). This happens under wait_lock so it is
>   serialized and so a WRITE_ONCE() is used to write the final owner. I
>   replaced it with a cmpxchg_acquire() to have the owner there.
>   Not sure if I shouldn't make this as you put it:
> |   e.g. by making use of dependency ordering where it already exists.
>   The other (locking) CPU needs to see the owner not only the WAITER
>   bit. I'm not sure if this could be replaced with smp_store_release().
> 
> - After the whole operation completes, fixup_rt_mutex_waiters() cleans
>   the WAITER bit and I kept the _acquire semantic here. Now looking at
>   it again, I don't think that needs to be done since that shouldn't be
>   set here.
> 
> - There is rtmutex_spin_on_owner() which (as the name implies) spins on
>   the owner as long as it active. It obtains it via READ_ONCE() and I'm
>   not sure if any memory barrier is needed. Worst case is that it will
>   spin while owner isn't set if it observers a stale value.
> 
> - The unlock path first clears the waiter bit if there are no waiters
>   recorded (via simple assignments under the wait_lock (every locker
>   will fail with the cmpxchg_acquire() and go for the wait_lock)) and
>   then finally drop it via rt_mutex_cmpxchg_release(,, NULL).
>   Should there be a wait, it will just store the WAITER bit with
>   smp_store_release() (setting the owner is NULL but the WAITER bit
>   forces everyone into the slow path).
> 
> - Added rt_mutex_set_owner_pi() which does simple assignment. This is
>   used from the futex code and here everything happens under a lock.
> 
> - I added a smp_load_acquire() to rt_mutex_base_is_locked() since I
>   *think* want to observe a real waiter and not something stale.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

include/linux/rtmutex.h needs to also include asm/barrier.h to resolve
some build problems. Once that was resolved, 10 iterations of the dbench
work completed successfully and without the patch, 1 iteration could not
complete.

Review is trickier as I'm not spent any reasonable amount of time on locking
primitives. I'd have to defer to Peter in that regard but I skimmed it at
least before wrapping up for the evening.

> ---
>  include/linux/rtmutex.h      |  2 +-
>  kernel/locking/rtmutex.c     | 26 ++++++++++++++++++--------
>  kernel/locking/rtmutex_api.c |  4 ++--
>  3 files changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h
> index 7d049883a08ac..4447e01f723d4 100644
> --- a/include/linux/rtmutex.h
> +++ b/include/linux/rtmutex.h
> @@ -41,7 +41,7 @@ struct rt_mutex_base {
>   */
>  static inline bool rt_mutex_base_is_locked(struct rt_mutex_base *lock)
>  {
> -	return READ_ONCE(lock->owner) != NULL;
> +	return smp_load_acquire(&lock->owner) != NULL;
>  }
>  

I don't believe this is necessary. It's only needed when checking if a
lock is acquired or not and it's inherently race-prone. It's harmless if
a stale value is observed and it does not pair with a release. Mostly it's
useful for debugging checks.

>  extern void rt_mutex_base_init(struct rt_mutex_base *rtb);
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index 7779ee8abc2a0..e3cc673e0c988 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -97,7 +97,7 @@ rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner)
>  	if (rt_mutex_has_waiters(lock))
>  		val |= RT_MUTEX_HAS_WAITERS;
>  
> -	WRITE_ONCE(lock->owner, (struct task_struct *)val);
> +	WARN_ON_ONCE(cmpxchg_acquire(&lock->owner, RT_MUTEX_HAS_WAITERS, val) != RT_MUTEX_HAS_WAITERS);
>  }
>  
>  static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
> @@ -106,6 +106,17 @@ static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
>  			((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS);
>  }
>  
> +static __always_inline void
> +rt_mutex_set_owner_pi(struct rt_mutex_base *lock, struct task_struct *owner)
> +{

What does pi mean in this context? I think the naming here might
misleading. rt_mutex_set_owner_pi is used when initialising and when
clearing the owner. rt_mutex_set_owner is set when acquiring the lock.

Consider renaming rt_mutex_set_owner_pi to rt_mutex_clear_owner. The init
could still use rt_mutex_set_owner as an extra barrier is not a big deal
during init if the straight assignment was unpopular.  The init could also
do a plain assignment because it cannot have any waiters yet.

What is less obvious is if rt_mutex_clear_owner should have explicit release
semantics to pair with rt_mutex_set_owner. It looks like it might not
matter because at least some paths end up having release semantics anyway
due to a spinlock but I didn't check all cases and it's potentially fragile.

> +	unsigned long val = (unsigned long)owner;
> +
> +	if (rt_mutex_has_waiters(lock))
> +		val |= RT_MUTEX_HAS_WAITERS;
> +
> +	WRITE_ONCE(lock->owner, val);
> +}
> +
>  static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
>  {
>  	unsigned long owner, *p = (unsigned long *) &lock->owner;
> @@ -173,7 +184,7 @@ static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
>  	 */
>  	owner = READ_ONCE(*p);
>  	if (owner & RT_MUTEX_HAS_WAITERS)
> -		WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
> +		cmpxchg_acquire(p, owner, owner & ~RT_MUTEX_HAS_WAITERS);
>  }
>  
>  /*
> @@ -196,17 +207,16 @@ static __always_inline bool rt_mutex_cmpxchg_release(struct rt_mutex_base *lock,
>  }
>  
>  /*
> - * Callers must hold the ->wait_lock -- which is the whole purpose as we force
> - * all future threads that attempt to [Rmw] the lock to the slowpath. As such
> - * relaxed semantics suffice.
> + * Callers hold the ->wait_lock. This needs to be visible to the lockowner,
> + * forcing him into the slow path for the unlock operation.
>   */
>  static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock)
>  {
>  	unsigned long owner, *p = (unsigned long *) &lock->owner;
>  
>  	do {
> -		owner = *p;
> -	} while (cmpxchg_relaxed(p, owner,
> +		owner = READ_ONCE(*p);
> +	} while (cmpxchg_acquire(p, owner,
>  				 owner | RT_MUTEX_HAS_WAITERS) != owner);
>  }
>  

Not 100% sure although I see it's to cover an exit path from
try_to_take_rt_mutex. I'm undecided if rt_mutex_set_owner having acquire
semantics and rt_mutex_clear_owner having clear semantics would be
sufficient. try_to_take_rt_mutex can still return with release semantics
but only in the case where it fails to acquire the lock.

> @@ -1218,7 +1228,7 @@ static void __sched mark_wakeup_next_waiter(struct rt_wake_q_head *wqh,
>  	 * slow path making sure no task of lower priority than
>  	 * the top waiter can steal this lock.
>  	 */
> -	lock->owner = (void *) RT_MUTEX_HAS_WAITERS;
> +	smp_store_release(&lock->owner, (void *) RT_MUTEX_HAS_WAITERS);
>  
>  	/*
>  	 * We deboosted before waking the top waiter task such that we don't

This is within a locked section and would definitely see a barrier if
rt_mutex_wake_q_add_task calls wake_q_add but it's less clear if the
optimisation in rt_mutex_wake_q_add_task could race so I'm undecided if
it's necessary or not.

> diff --git a/kernel/locking/rtmutex_api.c b/kernel/locking/rtmutex_api.c
> index 900220941caac..9acc176f93d38 100644
> --- a/kernel/locking/rtmutex_api.c
> +++ b/kernel/locking/rtmutex_api.c
> @@ -249,7 +249,7 @@ void __sched rt_mutex_init_proxy_locked(struct rt_mutex_base *lock,
>  	 * recursion. Give the futex/rtmutex wait_lock a separate key.
>  	 */
>  	lockdep_set_class(&lock->wait_lock, &pi_futex_key);
> -	rt_mutex_set_owner(lock, proxy_owner);
> +	rt_mutex_set_owner_pi(lock, proxy_owner);
>  }
>  
>  /**
> @@ -267,7 +267,7 @@ void __sched rt_mutex_init_proxy_locked(struct rt_mutex_base *lock,
>  void __sched rt_mutex_proxy_unlock(struct rt_mutex_base *lock)
>  {
>  	debug_rt_mutex_proxy_unlock(lock);
> -	rt_mutex_set_owner(lock, NULL);
> +	rt_mutex_set_owner_pi(lock, NULL);
>  }
>  
>  /**
> -- 
> 2.38.1
> 

-- 
Mel Gorman
SUSE Labs

  parent reply	other threads:[~2022-11-30 20:23 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-03 11:54 Crash with PREEMPT_RT on aarch64 machine Jan Kara
2022-11-04  8:06 ` Hillf Danton
2022-11-07 12:41   ` Jan Kara
2022-11-07 14:43     ` Hillf Danton
2022-11-04 16:30 ` Sebastian Andrzej Siewior
2022-11-07 13:56   ` Jan Kara
2022-11-07 15:10     ` Sebastian Andrzej Siewior
2022-11-07 16:30       ` Jan Kara
2022-11-07 17:12         ` Sebastian Andrzej Siewior
2022-11-07 16:49       ` Waiman Long
2022-11-08 10:53         ` Mark Rutland
2022-11-08 17:45           ` Jan Kara
2022-11-09  9:55             ` Mark Rutland
2022-11-09 10:11               ` Pierre Gondois
2022-11-09 10:54                 ` Jan Kara
2022-11-09 11:01               ` Jan Kara
2022-11-09 13:52                 ` Pierre Gondois
2022-11-09 14:21                   ` Pierre Gondois
2022-11-09 12:57         ` Will Deacon
2022-11-09 15:40           ` Jan Kara
2022-11-11 14:27             ` Jan Kara
2022-11-14 12:41               ` Will Deacon
2022-11-28 15:58                 ` Sebastian Andrzej Siewior
2022-11-28 20:30                   ` kernel test robot
2022-11-28 21:11                   ` kernel test robot
2022-11-29  5:16                   ` kernel test robot
2022-11-29  5:26                   ` kernel test robot
2022-11-29  6:48                   ` kernel test robot
2022-11-29  7:39                   ` kernel test robot
2022-11-30 17:20                   ` Pierre Gondois
2022-12-01 12:37                     ` Jan Kara
2022-11-30 20:22                   ` Mel Gorman [this message]
2022-12-01 17:09                     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221130202204.usku3rl6wowiugju@suse.de \
    --to=mgorman@suse.de \
    --cc=bigeasy@linutronix.de \
    --cc=catalin.marinas@arm.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.