From: Mel Gorman <mgorman@suse.de>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Will Deacon <will@kernel.org>, Jan Kara <jack@suse.cz>,
Waiman Long <longman@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Steven Rostedt <rostedt@goodmis.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Catalin Marinas <catalin.marinas@arm.com>
Subject: Re: Crash with PREEMPT_RT on aarch64 machine
Date: Wed, 30 Nov 2022 20:22:04 +0000 [thread overview]
Message-ID: <20221130202204.usku3rl6wowiugju@suse.de> (raw)
In-Reply-To: <Y4Tapja2qq8HiHBZ@linutronix.de>
On Mon, Nov 28, 2022 at 04:58:30PM +0100, Sebastian Andrzej Siewior wrote:
> How about this?
>
> - The fast path is easy???
>
> - The slow path first sets the WAITER bits (mark_rt_mutex_waiters()) so
> I made that one _acquire so that it is visible by the unlocker forcing
> everyone into slow path.
>
> - If the lock is acquired, then the owner is written via
> rt_mutex_set_owner(). This happens under wait_lock so it is
> serialized and so a WRITE_ONCE() is used to write the final owner. I
> replaced it with a cmpxchg_acquire() to have the owner there.
> Not sure if I shouldn't make this as you put it:
> | e.g. by making use of dependency ordering where it already exists.
> The other (locking) CPU needs to see the owner not only the WAITER
> bit. I'm not sure if this could be replaced with smp_store_release().
>
> - After the whole operation completes, fixup_rt_mutex_waiters() cleans
> the WAITER bit and I kept the _acquire semantic here. Now looking at
> it again, I don't think that needs to be done since that shouldn't be
> set here.
>
> - There is rtmutex_spin_on_owner() which (as the name implies) spins on
> the owner as long as it active. It obtains it via READ_ONCE() and I'm
> not sure if any memory barrier is needed. Worst case is that it will
> spin while owner isn't set if it observers a stale value.
>
> - The unlock path first clears the waiter bit if there are no waiters
> recorded (via simple assignments under the wait_lock (every locker
> will fail with the cmpxchg_acquire() and go for the wait_lock)) and
> then finally drop it via rt_mutex_cmpxchg_release(,, NULL).
> Should there be a wait, it will just store the WAITER bit with
> smp_store_release() (setting the owner is NULL but the WAITER bit
> forces everyone into the slow path).
>
> - Added rt_mutex_set_owner_pi() which does simple assignment. This is
> used from the futex code and here everything happens under a lock.
>
> - I added a smp_load_acquire() to rt_mutex_base_is_locked() since I
> *think* want to observe a real waiter and not something stale.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
include/linux/rtmutex.h needs to also include asm/barrier.h to resolve
some build problems. Once that was resolved, 10 iterations of the dbench
work completed successfully and without the patch, 1 iteration could not
complete.
Review is trickier as I'm not spent any reasonable amount of time on locking
primitives. I'd have to defer to Peter in that regard but I skimmed it at
least before wrapping up for the evening.
> ---
> include/linux/rtmutex.h | 2 +-
> kernel/locking/rtmutex.c | 26 ++++++++++++++++++--------
> kernel/locking/rtmutex_api.c | 4 ++--
> 3 files changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h
> index 7d049883a08ac..4447e01f723d4 100644
> --- a/include/linux/rtmutex.h
> +++ b/include/linux/rtmutex.h
> @@ -41,7 +41,7 @@ struct rt_mutex_base {
> */
> static inline bool rt_mutex_base_is_locked(struct rt_mutex_base *lock)
> {
> - return READ_ONCE(lock->owner) != NULL;
> + return smp_load_acquire(&lock->owner) != NULL;
> }
>
I don't believe this is necessary. It's only needed when checking if a
lock is acquired or not and it's inherently race-prone. It's harmless if
a stale value is observed and it does not pair with a release. Mostly it's
useful for debugging checks.
> extern void rt_mutex_base_init(struct rt_mutex_base *rtb);
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index 7779ee8abc2a0..e3cc673e0c988 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -97,7 +97,7 @@ rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner)
> if (rt_mutex_has_waiters(lock))
> val |= RT_MUTEX_HAS_WAITERS;
>
> - WRITE_ONCE(lock->owner, (struct task_struct *)val);
> + WARN_ON_ONCE(cmpxchg_acquire(&lock->owner, RT_MUTEX_HAS_WAITERS, val) != RT_MUTEX_HAS_WAITERS);
> }
>
> static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
> @@ -106,6 +106,17 @@ static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
> ((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS);
> }
>
> +static __always_inline void
> +rt_mutex_set_owner_pi(struct rt_mutex_base *lock, struct task_struct *owner)
> +{
What does pi mean in this context? I think the naming here might
misleading. rt_mutex_set_owner_pi is used when initialising and when
clearing the owner. rt_mutex_set_owner is set when acquiring the lock.
Consider renaming rt_mutex_set_owner_pi to rt_mutex_clear_owner. The init
could still use rt_mutex_set_owner as an extra barrier is not a big deal
during init if the straight assignment was unpopular. The init could also
do a plain assignment because it cannot have any waiters yet.
What is less obvious is if rt_mutex_clear_owner should have explicit release
semantics to pair with rt_mutex_set_owner. It looks like it might not
matter because at least some paths end up having release semantics anyway
due to a spinlock but I didn't check all cases and it's potentially fragile.
> + unsigned long val = (unsigned long)owner;
> +
> + if (rt_mutex_has_waiters(lock))
> + val |= RT_MUTEX_HAS_WAITERS;
> +
> + WRITE_ONCE(lock->owner, val);
> +}
> +
> static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
> {
> unsigned long owner, *p = (unsigned long *) &lock->owner;
> @@ -173,7 +184,7 @@ static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
> */
> owner = READ_ONCE(*p);
> if (owner & RT_MUTEX_HAS_WAITERS)
> - WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
> + cmpxchg_acquire(p, owner, owner & ~RT_MUTEX_HAS_WAITERS);
> }
>
> /*
> @@ -196,17 +207,16 @@ static __always_inline bool rt_mutex_cmpxchg_release(struct rt_mutex_base *lock,
> }
>
> /*
> - * Callers must hold the ->wait_lock -- which is the whole purpose as we force
> - * all future threads that attempt to [Rmw] the lock to the slowpath. As such
> - * relaxed semantics suffice.
> + * Callers hold the ->wait_lock. This needs to be visible to the lockowner,
> + * forcing him into the slow path for the unlock operation.
> */
> static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock)
> {
> unsigned long owner, *p = (unsigned long *) &lock->owner;
>
> do {
> - owner = *p;
> - } while (cmpxchg_relaxed(p, owner,
> + owner = READ_ONCE(*p);
> + } while (cmpxchg_acquire(p, owner,
> owner | RT_MUTEX_HAS_WAITERS) != owner);
> }
>
Not 100% sure although I see it's to cover an exit path from
try_to_take_rt_mutex. I'm undecided if rt_mutex_set_owner having acquire
semantics and rt_mutex_clear_owner having clear semantics would be
sufficient. try_to_take_rt_mutex can still return with release semantics
but only in the case where it fails to acquire the lock.
> @@ -1218,7 +1228,7 @@ static void __sched mark_wakeup_next_waiter(struct rt_wake_q_head *wqh,
> * slow path making sure no task of lower priority than
> * the top waiter can steal this lock.
> */
> - lock->owner = (void *) RT_MUTEX_HAS_WAITERS;
> + smp_store_release(&lock->owner, (void *) RT_MUTEX_HAS_WAITERS);
>
> /*
> * We deboosted before waking the top waiter task such that we don't
This is within a locked section and would definitely see a barrier if
rt_mutex_wake_q_add_task calls wake_q_add but it's less clear if the
optimisation in rt_mutex_wake_q_add_task could race so I'm undecided if
it's necessary or not.
> diff --git a/kernel/locking/rtmutex_api.c b/kernel/locking/rtmutex_api.c
> index 900220941caac..9acc176f93d38 100644
> --- a/kernel/locking/rtmutex_api.c
> +++ b/kernel/locking/rtmutex_api.c
> @@ -249,7 +249,7 @@ void __sched rt_mutex_init_proxy_locked(struct rt_mutex_base *lock,
> * recursion. Give the futex/rtmutex wait_lock a separate key.
> */
> lockdep_set_class(&lock->wait_lock, &pi_futex_key);
> - rt_mutex_set_owner(lock, proxy_owner);
> + rt_mutex_set_owner_pi(lock, proxy_owner);
> }
>
> /**
> @@ -267,7 +267,7 @@ void __sched rt_mutex_init_proxy_locked(struct rt_mutex_base *lock,
> void __sched rt_mutex_proxy_unlock(struct rt_mutex_base *lock)
> {
> debug_rt_mutex_proxy_unlock(lock);
> - rt_mutex_set_owner(lock, NULL);
> + rt_mutex_set_owner_pi(lock, NULL);
> }
>
> /**
> --
> 2.38.1
>
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2022-11-30 20:23 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-03 11:54 Crash with PREEMPT_RT on aarch64 machine Jan Kara
2022-11-04 16:30 ` Sebastian Andrzej Siewior
2022-11-07 13:56 ` Jan Kara
2022-11-07 15:10 ` Sebastian Andrzej Siewior
2022-11-07 16:30 ` Jan Kara
2022-11-07 17:12 ` Sebastian Andrzej Siewior
2022-11-07 16:49 ` Waiman Long
2022-11-08 10:53 ` Mark Rutland
2022-11-08 17:45 ` Jan Kara
2022-11-09 9:55 ` Mark Rutland
2022-11-09 10:11 ` Pierre Gondois
2022-11-09 10:54 ` Jan Kara
2022-11-09 11:01 ` Jan Kara
2022-11-09 13:52 ` Pierre Gondois
2022-11-09 14:21 ` Pierre Gondois
2022-11-09 12:57 ` Will Deacon
2022-11-09 15:40 ` Jan Kara
2022-11-11 14:27 ` Jan Kara
2022-11-14 12:41 ` Will Deacon
2022-11-28 15:58 ` Sebastian Andrzej Siewior
2022-11-28 20:30 ` kernel test robot
2022-11-28 21:11 ` kernel test robot
2022-11-29 5:16 ` kernel test robot
2022-11-29 5:26 ` kernel test robot
2022-11-29 6:48 ` kernel test robot
2022-11-29 7:39 ` kernel test robot
2022-11-30 17:20 ` Pierre Gondois
2022-12-01 12:37 ` Jan Kara
2022-11-30 20:22 ` Mel Gorman [this message]
2022-12-01 17:09 ` Mel Gorman
[not found] ` <20221104080637.626-1-hdanton@sina.com>
2022-11-07 12:41 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221130202204.usku3rl6wowiugju@suse.de \
--to=mgorman@suse.de \
--cc=bigeasy@linutronix.de \
--cc=catalin.marinas@arm.com \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).