linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	Davidlohr Bueso <dave@stgolabs.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	huang ying <huang.ying.caritas@gmail.com>,
	Waiman Long <longman@redhat.com>,
	stable@vger.kernel.org
Subject: [PATCH-tip v7 01/20] locking/rwsem: Prevent decrement of reader count before increment
Date: Sun, 28 Apr 2019 17:25:38 -0400	[thread overview]
Message-ID: <20190428212557.13482-2-longman@redhat.com> (raw)
In-Reply-To: <20190428212557.13482-1-longman@redhat.com>

During my rwsem testing, it was found that after a down_read(), the
reader count may occasionally become 0 or even negative. Consequently,
a writer may steal the lock at that time and execute with the reader
in parallel thus breaking the mutual exclusion guarantee of the write
lock. In other words, both readers and writer can become rwsem owners
simultaneously.

The current reader wakeup code does it in one pass to clear waiter->task
and put them into wake_q before fully incrementing the reader count.
Once waiter->task is cleared, the corresponding reader may see it,
finish the critical section and do unlock to decrement the count before
the count is incremented. This is not a problem if there is only one
reader to wake up as the count has been pre-incremented by 1.  It is
a problem if there are more than one readers to be woken up and writer
can steal the lock.

The wakeup was actually done in 2 passes before the v4.9 commit
70800c3c0cc5 ("locking/rwsem: Scan the wait_list for readers only
once"). To fix this problem, the wakeup is now done in two passes
again. In the first pass, we collect the readers and count them. The
reader count is then fully incremented. In the second pass, the
waiter->task is then cleared and they are put into wake_q to be woken
up later.

Fixes: 70800c3c0cc5 ("locking/rwsem: Scan the wait_list for readers only once")
Cc: <stable@vger.kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 46 +++++++++++++++++++++++++------------
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 6b3ee9948bf1..0b1f77957240 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -130,6 +130,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 {
 	struct rwsem_waiter *waiter, *tmp;
 	long oldcount, woken = 0, adjustment = 0;
+	struct list_head wlist;
 
 	/*
 	 * Take a peek at the queue head waiter such that we can determine
@@ -188,18 +189,43 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 	 * of the queue. We know that woken will be at least 1 as we accounted
 	 * for above. Note we increment the 'active part' of the count by the
 	 * number of readers before waking any processes up.
+	 *
+	 * We have to do wakeup in 2 passes to prevent the possibility that
+	 * the reader count may be decremented before it is incremented. It
+	 * is because the to-be-woken waiter may not have slept yet. So it
+	 * may see waiter->task got cleared, finish its critical section and
+	 * do an unlock before the reader count increment.
+	 *
+	 * 1) Collect the read-waiters in a separate list, count them and
+	 *    fully increment the reader count in rwsem.
+	 * 2) For each waiters in the new list, clear waiter->task and
+	 *    put them into wake_q to be woken up later.
 	 */
-	list_for_each_entry_safe(waiter, tmp, &sem->wait_list, list) {
-		struct task_struct *tsk;
-
+	list_for_each_entry(waiter, &sem->wait_list, list) {
 		if (waiter->type == RWSEM_WAITING_FOR_WRITE)
 			break;
 
 		woken++;
-		tsk = waiter->task;
+	}
+	list_cut_before(&wlist, &sem->wait_list, &waiter->list);
+
+	adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
+	lockevent_cond_inc(rwsem_wake_reader, woken);
+	if (list_empty(&sem->wait_list)) {
+		/* hit end of list above */
+		adjustment -= RWSEM_WAITING_BIAS;
+	}
+
+	if (adjustment)
+		atomic_long_add(adjustment, &sem->count);
+
+	/* 2nd pass */
+	list_for_each_entry_safe(waiter, tmp, &wlist, list) {
+		struct task_struct *tsk;
 
+		tsk = waiter->task;
 		get_task_struct(tsk);
-		list_del(&waiter->list);
+
 		/*
 		 * Ensure calling get_task_struct() before setting the reader
 		 * waiter to nil such that rwsem_down_read_failed() cannot
@@ -213,16 +239,6 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		 */
 		wake_q_add_safe(wake_q, tsk);
 	}
-
-	adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
-	lockevent_cond_inc(rwsem_wake_reader, woken);
-	if (list_empty(&sem->wait_list)) {
-		/* hit end of list above */
-		adjustment -= RWSEM_WAITING_BIAS;
-	}
-
-	if (adjustment)
-		atomic_long_add(adjustment, &sem->count);
 }
 
 /*
-- 
2.18.1


  reply	other threads:[~2019-04-28 21:26 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-28 21:25 [PATCH-tip v7 00/20] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
2019-04-28 21:25 ` Waiman Long [this message]
2019-05-03 12:06   ` [PATCH-tip v7 01/20] locking/rwsem: Prevent decrement of reader count before increment Peter Zijlstra
2019-05-03 13:32     ` Waiman Long
2019-05-07  7:07   ` [tip:locking/urgent] " tip-bot for Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 02/20] locking/rwsem: Make owner available even if !CONFIG_RWSEM_SPIN_ON_OWNER Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 03/20] locking/rwsem: Remove rwsem_wake() wakeup optimization Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 04/20] locking/rwsem: Implement a new locking scheme Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 05/20] locking/rwsem: Merge rwsem.h and rwsem-xadd.c into rwsem.c Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 06/20] locking/rwsem: Code cleanup after files merging Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 07/20] locking/rwsem: Make rwsem_spin_on_owner() return owner state Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 08/20] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
2019-05-03 13:10   ` Peter Zijlstra
2019-05-03 13:57     ` Waiman Long
2019-05-03 14:37     ` David Laight
2019-04-28 21:25 ` [PATCH-tip v7 09/20] locking/rwsem: Always release wait_lock before waking up tasks Waiman Long
2019-05-03 13:37   ` Peter Zijlstra
2019-05-03 13:56     ` Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 10/20] locking/rwsem: More optimal RT task handling of null owner Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 11/20] locking/rwsem: Wake up almost all readers in wait queue Waiman Long
2019-05-03 16:51   ` Peter Zijlstra
2019-05-03 17:15     ` Waiman Long
2019-05-06 11:49       ` Peter Zijlstra
2019-04-28 21:25 ` [PATCH-tip v7 12/20] locking/rwsem: Clarify usage of owner's nonspinaable bit Waiman Long
2019-05-03 15:21   ` Peter Zijlstra
2019-05-03 15:26     ` Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 13/20] locking/rwsem: Enable readers spinning on writer Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 14/20] locking/rwsem: Enable time-based spinning on reader-owned rwsem Waiman Long
2019-05-06 15:47   ` Peter Zijlstra
2019-04-28 21:25 ` [PATCH-tip v7 15/20] locking/rwsem: Adaptive disabling of reader optimistic spinning Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 16/20] locking/rwsem: Add more rwsem owner access helpers Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 17/20] locking/rwsem: Guard against making count negative Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 18/20] locking/rwsem: Merge owner into count on x86-64 Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 19/20] locking/rwsem: Remove redundant computation of writer lock word Waiman Long
2019-04-28 21:25 ` [PATCH-tip v7 20/20] locking/rwsem: Disable preemption in down_read*() if owner in count Waiman Long
2019-04-28 22:46 ` [PATCH-tip v7 00/20] locking/rwsem: Rwsem rearchitecture part 2 Linus Torvalds
2019-04-28 23:12   ` Waiman Long
2019-04-28 23:19     ` Waiman Long
2019-04-29  0:10     ` Linus Torvalds
2019-04-29  0:27       ` Waiman Long
2019-04-29  2:41         ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190428212557.13482-2-longman@redhat.com \
    --to=longman@redhat.com \
    --cc=bp@alien8.de \
    --cc=dave@stgolabs.net \
    --cc=hpa@zytor.com \
    --cc=huang.ying.caritas@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).