All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kernel/rtmutex: Handle non enqueued waiters gracefully in remove_waiter()
@ 2018-03-12 14:28 Sebastian Andrzej Siewior
  2018-03-16 12:20 ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-03-12 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner

In -RT task_blocks_on_rt_mutex() may return with -EAGAIN due to
(->pi_blocked_on == PI_WAKEUP_INPROGRESS) before it added itself as a
waiter. In such a case we must not call remove_waiter() because without
a waiter it will trigger the BUG_ON() statement.

This was initially reported by Yimin Deng. Thomas Gleixner fixed it then
with an explicit check for waiters before calling remove_waiter() with
the following note:

| Guard it with rt_mutex_has_waiters(). This is a quick fix which is
| easy to backport. The proper fix is to have a central check in
| remove_waiter() so we can call it unconditionally.

This is the suggested change.
Now that it is possible to call remove_waiter() unconditionally I also
remove that check from rt_mutex_slowlock().

Link: https://lkml.kernel.org/r/CAAh1qt=DCL9aUXNxanP5BKtiPp3m+qj4yB+gDohhXPVFCxWwzg@mail.gmail.com
Reported-and-debugged-by: Yimin Deng <yimin11.deng@gmail.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/locking/rtmutex.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 65cc0cb984e6..57d28d8f5280 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1068,12 +1068,15 @@ static void mark_wakeup_next_waiter(struct wake_q_head *wake_q,
 static void remove_waiter(struct rt_mutex *lock,
 			  struct rt_mutex_waiter *waiter)
 {
-	bool is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
+	bool is_top_waiter = false;
 	struct task_struct *owner = rt_mutex_owner(lock);
 	struct rt_mutex *next_lock;
 
 	lockdep_assert_held(&lock->wait_lock);
 
+	if (rt_mutex_has_waiters(lock))
+		is_top_waiter = waiter == rt_mutex_top_waiter(lock);
+
 	raw_spin_lock(&current->pi_lock);
 	rt_mutex_dequeue(lock, waiter);
 	current->pi_blocked_on = NULL;
@@ -1268,8 +1271,7 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
 
 	if (unlikely(ret)) {
 		__set_current_state(TASK_RUNNING);
-		if (rt_mutex_has_waiters(lock))
-			remove_waiter(lock, &waiter);
+		remove_waiter(lock, &waiter);
 		rt_mutex_handle_deadlock(ret, chwalk, &waiter);
 	}
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread
* kernel BUG at kernel/rtmutex_common.h:75
@ 2015-11-04 14:35 Yimin Deng
  2015-11-06 14:41 ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Yimin Deng @ 2015-11-04 14:35 UTC (permalink / raw)
  To: linux-rt-users

I encountered “kernel BUG” which was reported in the
rt_mutex_top_waiter() at kernel/rtmutex_common.h:75.

Linux version: 3.12.37-rt51; CONFIG_PREEMPT_RT_FULL is disabled.
Architecture: PowerPC

We ported an application from pSOS RTOS to Linux using the
Xenomai-Mercury (=library to map pSOS task to POSIX threads). And We
have several threads running in the real-time priority domain.
ThreadA: running at prio -59. pthread_mutex_lock() +
pthread_cond_timedwait() + pthread_mutex_unlock()
ThreadB: running at prio -84. pthread_mutex_lock() +
pthread_cond_signal() + pthread_mutex_unlock()

ThreadA:
------ ------
futex_wait_requeue_pi()
    futex_wait_queue_me()
    <timed out>

    raw_spin_lock_irq(&current->pi_lock);
    if (current->pi_blocked_on) {
        raw_spin_unlock_irq(&current->pi_lock);
    } else {
        current->pi_blocked_on = PI_WAKEUP_INPROGRESS;
        raw_spin_unlock_irq(&current->pi_lock);
            <-- ThreadA was interrupted and preempted!
        spin_lock(&hb->lock);


ThreadB:
------ ------
rt_mutex_start_proxy_lock();
    task_blocks_on_rt_mutex();  <-- return "-EAGAIN" due to
"task->pi_blocked_on == PI_WAKEUP_INPROGRESS"
    ...
    if (unlikely(ret))
        remove_waiter(lock, waiter);
            int first = (waiter == rt_mutex_top_waiter(lock));  <--
BUG_ON(w->lock != lock);

It seems that the purpose to call the remove_waiter() is to remove the
waiter added by “plist_add(&waiter->list_entry, &lock->wait_list);” in
the task_blocks_on_rt_mutex(). But in the scenario above there's no
waiter on the lock yet and
the waiter has not been added into the wait list of the lock in the
task_blocks_on_rt_mutex() due to the failure “-EAGAIN”. So it reported
kernel BUG in the rt_mutex_top_waiter().

I modified it as below and the issue seems disappear.
- if (unlikely(ret))
+ if (unlikely(ret && (-EAGAIN != ret)))
       remove_waiter(lock, waiter);

Could the scenario above be possible? If so, how to resolve this issue?
Thanks!

B.R.
Yimin Deng
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-03-28 21:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-12 14:28 [PATCH] kernel/rtmutex: Handle non enqueued waiters gracefully in remove_waiter() Sebastian Andrzej Siewior
2018-03-16 12:20 ` Peter Zijlstra
2018-03-19 15:11   ` Thomas Gleixner
2018-03-27 12:14     ` [PATCH v2] locking/rtmutex: " Sebastian Andrzej Siewior
2018-03-28 21:07       ` [tip:locking/core] " tip-bot for Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2015-11-04 14:35 kernel BUG at kernel/rtmutex_common.h:75 Yimin Deng
2015-11-06 14:41 ` Thomas Gleixner
2015-11-07 18:09   ` Thomas Gleixner
2015-11-08  3:31     ` Yimin Deng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.