linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yongji Xie <elohimes@gmail.com>
To: peterz@infradead.org, mingo@redhat.com, will.deacon@arm.com
Cc: linux-kernel@vger.kernel.org, xieyongji@baidu.com,
	zhangyu31@baidu.com, liuqi16@baidu.com, yuanlinsi01@baidu.com,
	nixun@baidu.com, lilin24@baidu.com
Subject: [RFC] locking/rwsem: Avoid issuing wakeup before setting the reader waiter to nil
Date: Thu, 29 Nov 2018 20:50:30 +0800	[thread overview]
Message-ID: <1543495830-2644-1-git-send-email-xieyongji@baidu.com> (raw)

From: Xie Yongji <xieyongji@baidu.com>

Our system encountered a problem recently, the khungtaskd detected
some process hang on mmap_sem. But the odd thing was that one task which
is not on mmap_sem.wait_list still sleeps in rwsem_down_read_failed().
Through code inspection, we found a potential bug can lead to this.

Imaging this:

Thread 1                                  Thread 2
                                          down_write();
rwsem_down_read_failed()
 raw_spin_lock_irq(&sem->wait_lock);
 list_add_tail(&waiter.list, &wait_list);
 raw_spin_unlock_irq(&sem->wait_lock);
                                          __up_write();
                                           rwsem_wake();
                                            __rwsem_mark_wake();
                                             wake_q_add();
                                             list_del(&waiter->list);
                                             waiter->task = NULL;
 while (true) {
  set_current_state(TASK_UNINTERRUPTIBLE);
  if (!waiter.task) // true
      break;
 }
 __set_current_state(TASK_RUNNING);

Now Thread 1 is queued in Thread 2's wake_q without sleeping. Then
Thread 1 call rwsem_down_read_failed() again because Thread 3
hold the lock, if Thread 3 tries to queue Thread 1 before Thread 2
do wakeup, it will fail and miss wakeup:

Thread 1                                  Thread 2      Thread 3
                                                        down_write();
rwsem_down_read_failed()
 raw_spin_lock_irq(&sem->wait_lock);
 list_add_tail(&waiter.list, &wait_list);
 raw_spin_unlock_irq(&sem->wait_lock);
                                                        __rwsem_mark_wake();
                                                         wake_q_add();
                                          wake_up_q();
                                                         waiter->task = NULL;
 while (true) {
  set_current_state(TASK_UNINTERRUPTIBLE);
  if (!waiter.task) // false
      break;
  schedule();
 }
                                                        wake_up_q(&wake_q);

In another word, that means we might issue the wakeup before setting the reader
waiter to nil. If so, the wakeup may do nothing when it was called before reader
set task state to TASK_UNINTERRUPTIBLE. Then we would have no chance to wake up
the reader any more, and cause other writers such as "ps" command stuck on it.

This patch is not verified because we still have no way to reproduce the problem.
But I'd like to ask for some comments from community firstly.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
---
 kernel/locking/rwsem-xadd.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 09b1800..50d9af6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -198,15 +198,22 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		woken++;
 		tsk = waiter->task;
 
-		wake_q_add(wake_q, tsk);
+		get_task_struct(tsk);
 		list_del(&waiter->list);
 		/*
-		 * Ensure that the last operation is setting the reader
+		 * Ensure calling get_task_struct() before setting the reader
 		 * waiter to nil such that rwsem_down_read_failed() cannot
 		 * race with do_exit() by always holding a reference count
 		 * to the task to wakeup.
 		 */
 		smp_store_release(&waiter->task, NULL);
+		/*
+		 * Ensure issuing the wakeup (either by us or someone else)
+		 * after setting the reader waiter to nil.
+		 */
+		wake_q_add(wake_q, tsk);
+		/* wake_q_add() already take the task ref */
+		put_task_struct(tsk);
 	}
 
 	adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
-- 
2.2.3


             reply	other threads:[~2018-11-29 12:50 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-29 12:50 Yongji Xie [this message]
2018-11-29 13:12 ` [RFC] locking/rwsem: Avoid issuing wakeup before setting the reader waiter to nil Peter Zijlstra
2018-11-29 13:44   ` Peter Zijlstra
2018-11-29 14:02     ` Yongji Xie
2018-11-29 18:43     ` Davidlohr Bueso
2018-11-29 18:49       ` Waiman Long
2018-11-29 15:21   ` Waiman Long
2018-11-29 15:29     ` Waiman Long
2018-11-29 16:06     ` Peter Zijlstra
2018-11-29 17:02       ` Waiman Long
2018-11-29 17:27         ` Peter Zijlstra
2018-11-29 17:58           ` Waiman Long
2018-11-29 18:13             ` Peter Zijlstra
2018-11-29 18:17               ` Davidlohr Bueso
2018-11-29 18:08           ` Peter Zijlstra
2018-11-29 18:26             ` Waiman Long
2018-11-29 18:31               ` Will Deacon
2018-11-29 18:34                 ` Waiman Long
2018-11-29 22:05                   ` Peter Zijlstra
2018-11-30  9:34                     ` 答复: " Liu,Qi(ACU-T1)
2018-11-30 14:15                       ` Peter Zijlstra
2018-11-29 21:30               ` Davidlohr Bueso
2018-11-29 21:34                 ` Davidlohr Bueso
2018-11-29 22:17                   ` Peter Zijlstra
2018-11-30  9:30                     ` Andrea Parri
2018-12-03  5:31                     ` [PATCH -tip] kernel/sched,wake_q: Branch predict wake_q_add() cmpxchg Davidlohr Bueso
2018-12-03 16:10                       ` Waiman Long
2019-01-21 11:28                       ` [tip:locking/core] sched/wake_q: Add branch prediction hint to " tip-bot for Davidlohr Bueso
2018-12-10 15:12                     ` [RFC] locking/rwsem: Avoid issuing wakeup before setting the reader waiter to nil Yongji Xie
2018-12-17 11:37                       ` Peter Zijlstra
2018-12-17 13:12                         ` Yongji Xie
2019-01-07 14:35                           ` Waiman Long
2019-01-07 15:31                             ` Peter Zijlstra
2019-01-07 15:35                               ` Waiman Long
2018-12-17 20:53                         ` Davidlohr Bueso
2018-12-18 13:10                           ` Peter Zijlstra
2018-12-18 13:14                             ` Peter Zijlstra
2018-12-18 17:27                               ` Davidlohr Bueso
2018-12-18 18:54                               ` [PATCH v2] sched/wake_q: Reduce reference counting for special users Davidlohr Bueso
2018-12-18 19:17                                 ` Waiman Long
2018-12-18 19:30                                   ` Davidlohr Bueso
2018-12-18 19:39                                     ` Davidlohr Bueso
2018-12-18 19:53                                       ` [PATCH v4] " Davidlohr Bueso
2018-12-18 20:35                                         ` Waiman Long
2019-01-21 16:02                                           ` Davidlohr Bueso
2019-01-22  8:55                                             ` Peter Zijlstra
2019-02-04  8:57                                         ` [tip:locking/core] " tip-bot for Davidlohr Bueso
2019-02-07 19:30                                           ` Davidlohr Bueso
2019-02-12 14:14                                           ` Daniel Vacek
2019-01-21 11:28 ` [tip:locking/core] locking/rwsem: Fix (possible) missed wakeup tip-bot for Xie Yongji

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1543495830-2644-1-git-send-email-xieyongji@baidu.com \
    --to=elohimes@gmail.com \
    --cc=lilin24@baidu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liuqi16@baidu.com \
    --cc=mingo@redhat.com \
    --cc=nixun@baidu.com \
    --cc=peterz@infradead.org \
    --cc=will.deacon@arm.com \
    --cc=xieyongji@baidu.com \
    --cc=yuanlinsi01@baidu.com \
    --cc=zhangyu31@baidu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).