linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] locking/rwsem: Avoid issuing wakeup before setting the reader waiter to nil
@ 2018-11-29 12:50 Yongji Xie
  2018-11-29 13:12 ` Peter Zijlstra
  2019-01-21 11:28 ` [tip:locking/core] locking/rwsem: Fix (possible) missed wakeup tip-bot for Xie Yongji
  0 siblings, 2 replies; 50+ messages in thread
From: Yongji Xie @ 2018-11-29 12:50 UTC (permalink / raw)
  To: peterz, mingo, will.deacon
  Cc: linux-kernel, xieyongji, zhangyu31, liuqi16, yuanlinsi01, nixun, lilin24

From: Xie Yongji <xieyongji@baidu.com>

Our system encountered a problem recently, the khungtaskd detected
some process hang on mmap_sem. But the odd thing was that one task which
is not on mmap_sem.wait_list still sleeps in rwsem_down_read_failed().
Through code inspection, we found a potential bug can lead to this.

Imaging this:

Thread 1                                  Thread 2
                                          down_write();
rwsem_down_read_failed()
 raw_spin_lock_irq(&sem->wait_lock);
 list_add_tail(&waiter.list, &wait_list);
 raw_spin_unlock_irq(&sem->wait_lock);
                                          __up_write();
                                           rwsem_wake();
                                            __rwsem_mark_wake();
                                             wake_q_add();
                                             list_del(&waiter->list);
                                             waiter->task = NULL;
 while (true) {
  set_current_state(TASK_UNINTERRUPTIBLE);
  if (!waiter.task) // true
      break;
 }
 __set_current_state(TASK_RUNNING);

Now Thread 1 is queued in Thread 2's wake_q without sleeping. Then
Thread 1 call rwsem_down_read_failed() again because Thread 3
hold the lock, if Thread 3 tries to queue Thread 1 before Thread 2
do wakeup, it will fail and miss wakeup:

Thread 1                                  Thread 2      Thread 3
                                                        down_write();
rwsem_down_read_failed()
 raw_spin_lock_irq(&sem->wait_lock);
 list_add_tail(&waiter.list, &wait_list);
 raw_spin_unlock_irq(&sem->wait_lock);
                                                        __rwsem_mark_wake();
                                                         wake_q_add();
                                          wake_up_q();
                                                         waiter->task = NULL;
 while (true) {
  set_current_state(TASK_UNINTERRUPTIBLE);
  if (!waiter.task) // false
      break;
  schedule();
 }
                                                        wake_up_q(&wake_q);

In another word, that means we might issue the wakeup before setting the reader
waiter to nil. If so, the wakeup may do nothing when it was called before reader
set task state to TASK_UNINTERRUPTIBLE. Then we would have no chance to wake up
the reader any more, and cause other writers such as "ps" command stuck on it.

This patch is not verified because we still have no way to reproduce the problem.
But I'd like to ask for some comments from community firstly.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 kernel/locking/rwsem-xadd.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 09b1800..50d9af6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -198,15 +198,22 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		woken++;
 		tsk = waiter->task;
 
-		wake_q_add(wake_q, tsk);
+		get_task_struct(tsk);
 		list_del(&waiter->list);
 		/*
-		 * Ensure that the last operation is setting the reader
+		 * Ensure calling get_task_struct() before setting the reader
 		 * waiter to nil such that rwsem_down_read_failed() cannot
 		 * race with do_exit() by always holding a reference count
 		 * to the task to wakeup.
 		 */
 		smp_store_release(&waiter->task, NULL);
+		/*
+		 * Ensure issuing the wakeup (either by us or someone else)
+		 * after setting the reader waiter to nil.
+		 */
+		wake_q_add(wake_q, tsk);
+		/* wake_q_add() already take the task ref */
+		put_task_struct(tsk);
 	}
 
 	adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
-- 
2.2.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2019-02-12 14:14 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-29 12:50 [RFC] locking/rwsem: Avoid issuing wakeup before setting the reader waiter to nil Yongji Xie
2018-11-29 13:12 ` Peter Zijlstra
2018-11-29 13:44   ` Peter Zijlstra
2018-11-29 14:02     ` Yongji Xie
2018-11-29 18:43     ` Davidlohr Bueso
2018-11-29 18:49       ` Waiman Long
2018-11-29 15:21   ` Waiman Long
2018-11-29 15:29     ` Waiman Long
2018-11-29 16:06     ` Peter Zijlstra
2018-11-29 17:02       ` Waiman Long
2018-11-29 17:27         ` Peter Zijlstra
2018-11-29 17:58           ` Waiman Long
2018-11-29 18:13             ` Peter Zijlstra
2018-11-29 18:17               ` Davidlohr Bueso
2018-11-29 18:08           ` Peter Zijlstra
2018-11-29 18:26             ` Waiman Long
2018-11-29 18:31               ` Will Deacon
2018-11-29 18:34                 ` Waiman Long
2018-11-29 22:05                   ` Peter Zijlstra
2018-11-30  9:34                     ` 答复: " Liu,Qi(ACU-T1)
2018-11-30 14:15                       ` Peter Zijlstra
2018-11-29 21:30               ` Davidlohr Bueso
2018-11-29 21:34                 ` Davidlohr Bueso
2018-11-29 22:17                   ` Peter Zijlstra
2018-11-30  9:30                     ` Andrea Parri
2018-12-03  5:31                     ` [PATCH -tip] kernel/sched,wake_q: Branch predict wake_q_add() cmpxchg Davidlohr Bueso
2018-12-03 16:10                       ` Waiman Long
2019-01-21 11:28                       ` [tip:locking/core] sched/wake_q: Add branch prediction hint to " tip-bot for Davidlohr Bueso
2018-12-10 15:12                     ` [RFC] locking/rwsem: Avoid issuing wakeup before setting the reader waiter to nil Yongji Xie
2018-12-17 11:37                       ` Peter Zijlstra
2018-12-17 13:12                         ` Yongji Xie
2019-01-07 14:35                           ` Waiman Long
2019-01-07 15:31                             ` Peter Zijlstra
2019-01-07 15:35                               ` Waiman Long
2018-12-17 20:53                         ` Davidlohr Bueso
2018-12-18 13:10                           ` Peter Zijlstra
2018-12-18 13:14                             ` Peter Zijlstra
2018-12-18 17:27                               ` Davidlohr Bueso
2018-12-18 18:54                               ` [PATCH v2] sched/wake_q: Reduce reference counting for special users Davidlohr Bueso
2018-12-18 19:17                                 ` Waiman Long
2018-12-18 19:30                                   ` Davidlohr Bueso
2018-12-18 19:39                                     ` Davidlohr Bueso
2018-12-18 19:53                                       ` [PATCH v4] " Davidlohr Bueso
2018-12-18 20:35                                         ` Waiman Long
2019-01-21 16:02                                           ` Davidlohr Bueso
2019-01-22  8:55                                             ` Peter Zijlstra
2019-02-04  8:57                                         ` [tip:locking/core] " tip-bot for Davidlohr Bueso
2019-02-07 19:30                                           ` Davidlohr Bueso
2019-02-12 14:14                                           ` Daniel Vacek
2019-01-21 11:28 ` [tip:locking/core] locking/rwsem: Fix (possible) missed wakeup tip-bot for Xie Yongji

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).