LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Davidlohr Bueso <dave@stgolabs.net>
To: peterz@infradead.org, mingo@kernel.org
Cc: Waiman.Long@hp.com, jason.low2@hpe.com, wanpeng.li@hotmail.com,
	dave@stgolabs.net, linux-kernel@vger.kernel.org,
	Davidlohr Bueso <dbueso@suse.de>
Subject: [PATCH 3/3] locking/rwsem: Scan the wait_list for readers only once
Date: Fri,  5 Aug 2016 01:04:45 -0700
Message-ID: <1470384285-32163-4-git-send-email-dave@stgolabs.net> (raw)
In-Reply-To: <1470384285-32163-1-git-send-email-dave@stgolabs.net>

When wanting to wakeup readers, __rwsem_mark_wakeup() currently
iterates the wait_list twice while looking to wakeup the first N
queued reader-tasks. While this can be quite inefficient, it was
there such that a awoken reader would be first and foremost
acknowledged by the lock counter.

Keeping the same logic, we can further benefit from the use of
wake_qs and avoid entirely the first wait_list iteration that sets
the counter as wake_up_process() isn't going to occur right away,
and therefore we maintain the counter->list order of going about
things.

Other than saving cycles with O(n) "scanning", this change also
nicely cleans up a good chunk of __rwsem_mark_wakeup(); both
visually and less tedious to read.

For example, the following improvements where seen on some will
it scale microbenchmarks, on a 48-core Haswell:

                                     v4.7              v4.7-rwsem-v1
Hmean    signal1-processes-8    5792691.42 (  0.00%)  5771971.04 ( -0.36%)
Hmean    signal1-processes-12   6081199.96 (  0.00%)  6072174.38 ( -0.15%)
Hmean    signal1-processes-21   3071137.71 (  0.00%)  3041336.72 ( -0.97%)
Hmean    signal1-processes-48   3712039.98 (  0.00%)  3708113.59 ( -0.11%)
Hmean    signal1-processes-79   4464573.45 (  0.00%)  4682798.66 (  4.89%)
Hmean    signal1-processes-110  4486842.01 (  0.00%)  4633781.71 (  3.27%)
Hmean    signal1-processes-141  4611816.83 (  0.00%)  4692725.38 (  1.75%)
Hmean    signal1-processes-172  4638157.05 (  0.00%)  4714387.86 (  1.64%)
Hmean    signal1-processes-203  4465077.80 (  0.00%)  4690348.07 (  5.05%)
Hmean    signal1-processes-224  4410433.74 (  0.00%)  4687534.43 (  6.28%)

Stddev   signal1-processes-8       6360.47 (  0.00%)     8455.31 ( 32.94%)
Stddev   signal1-processes-12      4004.98 (  0.00%)     9156.13 (128.62%)
Stddev   signal1-processes-21      3273.14 (  0.00%)     5016.80 ( 53.27%)
Stddev   signal1-processes-48     28420.25 (  0.00%)    26576.22 ( -6.49%)
Stddev   signal1-processes-79     22038.34 (  0.00%)    18992.70 (-13.82%)
Stddev   signal1-processes-110    23226.93 (  0.00%)    17245.79 (-25.75%)
Stddev   signal1-processes-141     6358.98 (  0.00%)     7636.14 ( 20.08%)
Stddev   signal1-processes-172     9523.70 (  0.00%)     4824.75 (-49.34%)
Stddev   signal1-processes-203    13915.33 (  0.00%)     9326.33 (-32.98%)
Stddev   signal1-processes-224    15573.94 (  0.00%)    10613.82 (-31.85%)

Other runs that saw improvements include context_switch and pipe; and
as expected, this is particularly highlighted on larger thread counts
as it becomes more expensive to walk the list twice.

No change in wakeup ordering or semantics.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/rwsem-xadd.c | 58 ++++++++++++++++++++-------------------------
 1 file changed, 26 insertions(+), 32 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index e02fe3289b5a..2337b4bb2366 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -125,12 +125,14 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 			      enum rwsem_wake_type wake_type,
 			      struct wake_q_head *wake_q)
 {
-	struct rwsem_waiter *waiter;
-	struct task_struct *tsk;
-	struct list_head *next;
-	long loop, oldcount, woken = 0, adjustment = 0;
+	struct rwsem_waiter *waiter, *tmp;
+	long oldcount, woken = 0, adjustment = 0;
 
-	waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
+	/*
+	 * Take a peek at the queue head waiter such that we can determine
+	 * the wakeup(s) to perform.
+	 */
+	waiter = list_first_entry(&sem->wait_list, struct rwsem_waiter, list);
 
 	if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
 		if (wake_type == RWSEM_WAKE_ANY) {
@@ -180,36 +182,21 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 
 	/*
 	 * Grant an infinite number of read locks to the readers at the front
-	 * of the queue.  Note we increment the 'active part' of the count by
-	 * the number of readers before waking any processes up.
+	 * of the queue. We know that woken will be at least 1 as we accounted
+	 * for above. Note we increment the 'active part' of the count by the
+	 * number of readers before waking any processes up.
 	 */
-	do {
-		woken++;
+	list_for_each_entry_safe(waiter, tmp, &sem->wait_list, list) {
+		struct task_struct *tsk;
 
-		if (waiter->list.next == &sem->wait_list)
+		if (waiter->type == RWSEM_WAITING_FOR_WRITE)
 			break;
 
-		waiter = list_entry(waiter->list.next,
-					struct rwsem_waiter, list);
-
-	} while (waiter->type != RWSEM_WAITING_FOR_WRITE);
-
-	adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
-	if (waiter->type != RWSEM_WAITING_FOR_WRITE)
-		/* hit end of list above */
-		adjustment -= RWSEM_WAITING_BIAS;
-
-	if (adjustment)
-		atomic_long_add(adjustment, &sem->count);
-
-	next = sem->wait_list.next;
-	loop = woken;
-	do {
-		waiter = list_entry(next, struct rwsem_waiter, list);
-		next = waiter->list.next;
+		woken++;
 		tsk = waiter->task;
 
 		wake_q_add(wake_q, tsk);
+		list_del(&waiter->list);
 		/*
 		 * Ensure that the last operation is setting the reader
 		 * waiter to nil such that rwsem_down_read_failed() cannot
@@ -217,10 +204,16 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		 * to the task to wakeup.
 		 */
 		smp_store_release(&waiter->task, NULL);
-	} while (--loop);
+	}
 
-	sem->wait_list.next = next;
-	next->prev = &sem->wait_list;
+	adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
+	if (list_empty(&sem->wait_list)) {
+		/* hit end of list above */
+		adjustment -= RWSEM_WAITING_BIAS;
+	}
+
+	if (adjustment)
+		atomic_long_add(adjustment, &sem->count);
 }
 
 /*
@@ -245,7 +238,8 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
 	/* we're now waiting on the lock, but no longer actively locking */
 	count = atomic_long_add_return(adjustment, &sem->count);
 
-	/* If there are no active locks, wake the front queued process(es).
+	/*
+	 * If there are no active locks, wake the front queued process(es).
 	 *
 	 * If there are no writers and we are first in the queue,
 	 * wake our own waiter to join the existing active readers !
-- 
2.6.6

  parent reply index

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-05  8:04 [PATCH 0/3] locking/rwsem: __rwsem_mark_wake() improvements Davidlohr Bueso
2016-08-05  8:04 ` [PATCH 1/3] locking/rwsem: Return void in __rwsem_mark_wake() Davidlohr Bueso
2016-08-18 11:00   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
2016-08-18 13:41   ` tip-bot for Davidlohr Bueso
2016-08-05  8:04 ` [PATCH 2/3] locking/rwsem: Remove a few useless comments Davidlohr Bueso
2016-08-18 11:00   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
2016-08-18 13:42   ` tip-bot for Davidlohr Bueso
2016-08-05  8:04 ` Davidlohr Bueso [this message]
2016-08-06  6:05   ` [PATCH 3/3] locking/rwsem: Scan the wait_list for readers only once Ingo Molnar
2016-08-08 16:37     ` Davidlohr Bueso
2016-08-18 11:01   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
2016-08-18 13:42   ` tip-bot for Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1470384285-32163-4-git-send-email-dave@stgolabs.net \
    --to=dave@stgolabs.net \
    --cc=Waiman.Long@hp.com \
    --cc=dbueso@suse.de \
    --cc=jason.low2@hpe.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git