All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] rwsem-spinlock: let rwsem write lock stealable
@ 2013-02-01 10:59 Yuanhan Liu
  2013-02-16  9:08 ` Yuanhan Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Yuanhan Liu @ 2013-02-01 10:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, Yuanhan Liu, David Howells, Michel Lespinasse

We(Linux Kernel Performance project) found a regression introduced by
commit 5a50508, which just convert all mutex lock to rwsem write lock.
The semantics is same, but the results is quite huge in some cases.
After investigation, we found the root cause: mutex support lock
stealing. Here is the link for the detailed regression report:
    https://lkml.org/lkml/2013/1/29/84

Ingo suggests to add write lock stealing to rwsem as well:
    "I think we should allow lock-steal between rwsem writers - that
     will not hurt fairness as most rwsem fairness concerns relate to
     reader vs. writer fairness"

And here is the rwsem-spinlock version.

With this patch, we got a double performance increase in one test box
with following aim7 workfile:
    FILESIZE: 1M
    POOLSIZE: 10M
    10 fork_test

some /usr/bin/time output w/o patch      some /usr/bin/time_output with patch
----------------------------------------------------------------------------
Percent of CPU this job got: 369%        Percent of CPU this job got: 537%
Voluntary context switches: 640595016    Voluntary context switches: 157915561
----------------------------------------------------------------------------
You will see we got a 45% increase of CPU usage and saves about 3/4
voluntary context switches.

Here is the .nr_running filed for all CPUs from /proc/sched_debug.

output w/o this patch:
----------------------
cpu 00:   0   0   ...   0   0   0   0   0   0   0   1   0   1 .... 0   0
cpu 01:   0   0   ...   1   0   0   0   0   0   1   1   0   1 .... 0   0
cpu 02:   0   0   ...   1   1   0   0   0   1   0   0   1   0 .... 1   1
cpu 03:   0   0   ...   0   1   0   0   0   1   1   0   1   1 .... 0   0
cpu 04:   0   1   ...   0   0   2   1   1   2   1   0   1   0 .... 1   0
cpu 05:   0   1   ...   0   0   2   1   1   2   1   1   1   1 .... 0   0
cpu 06:   0   0   ...   2   0   0   1   0   0   1   0   0   0 .... 0   0
cpu 07:   0   0   ...   2   0   0   0   1   0   1   1   0   0 .... 1   0
cpu 08:   0   0   ...   1   0   0   0   1   0   0   1   0   0 .... 0   1
cpu 09:   0   0   ...   1   0   0   0   1   0   0   1   0   0 .... 0   1
cpu 10:   0   0   ...   0   0   0   2   0   0   1   0   1   1 .... 1   2
cpu 11:   0   0   ...   0   0   0   2   2   0   1   0   1   0 .... 1   2
cpu 12:   0   0   ...   2   0   0   0   1   1   3   1   1   1 .... 1   0
cpu 13:   0   0   ...   2   0   0   0   1   1   3   1   1   0 .... 1   1
cpu 14:   0   0   ...   0   0   0   2   0   0   1   1   0   0 .... 1   0
cpu 15:   0   0   ...   1   0   0   2   0   0   1   1   0   0 .... 0   0

output with this patch:
-----------------------
cpu 00:   0   0   ...   1   1   2   1   1   1   2   1   1   1 .... 1   3
cpu 01:   0   0   ...   1   1   1   1   1   1   2   1   1   1 .... 1   3
cpu 02:   0   0   ...   2   2   3   2   0   2   1   2   1   1 .... 1   1
cpu 03:   0   0   ...   2   2   3   2   1   2   1   2   1   1 .... 1   1
cpu 04:   0   1   ...   2   0   0   1   0   1   3   1   1   1 .... 1   1
cpu 05:   0   1   ...   2   0   1   1   0   1   2   1   1   1 .... 1   1
cpu 06:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
cpu 07:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
cpu 08:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
cpu 09:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
cpu 10:   0   0   ...   1   1   1   0   0   1   1   1   1   1 .... 0   0
cpu 11:   0   0   ...   1   1   1   0   0   1   1   1   1   2 .... 1   0
cpu 12:   0   0   ...   1   1   1   0   1   1   0   0   0   1 .... 2   1
cpu 13:   0   0   ...   1   1   1   0   1   1   1   0   1   2 .... 2   0
cpu 14:   0   0   ...   2   0   0   0   0   1   1   1   1   1 .... 2   2
cpu 15:   0   0   ...   2   0   0   1   0   1   1   1   1   1 .... 2   2
------------------------------------------------------------------------
Where you can see that CPU is much busier with this patch.

v2: make it stealable at __down_write_trylock as well, pointed by Michel

Reported-by: LKP project <lkp@linux.intel.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 lib/rwsem-spinlock.c |   69 +++++++++++++++++--------------------------------
 1 files changed, 24 insertions(+), 45 deletions(-)

diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
index 7e0d6a5..7542afb 100644
--- a/lib/rwsem-spinlock.c
+++ b/lib/rwsem-spinlock.c
@@ -73,20 +73,13 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
 		goto dont_wake_writers;
 	}
 
-	/* if we are allowed to wake writers try to grant a single write lock
-	 * if there's a writer at the front of the queue
-	 * - we leave the 'waiting count' incremented to signify potential
-	 *   contention
+	/*
+	 * as we support write lock stealing, we can't set sem->activity
+	 * to -1 here to indicate we get the lock. Instead, we wake it up
+	 * to let it go get it again.
 	 */
 	if (waiter->flags & RWSEM_WAITING_FOR_WRITE) {
-		sem->activity = -1;
-		list_del(&waiter->list);
-		tsk = waiter->task;
-		/* Don't touch waiter after ->task has been NULLed */
-		smp_mb();
-		waiter->task = NULL;
-		wake_up_process(tsk);
-		put_task_struct(tsk);
+		wake_up_process(waiter->task);
 		goto out;
 	}
 
@@ -121,18 +114,10 @@ static inline struct rw_semaphore *
 __rwsem_wake_one_writer(struct rw_semaphore *sem)
 {
 	struct rwsem_waiter *waiter;
-	struct task_struct *tsk;
-
-	sem->activity = -1;
 
 	waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
-	list_del(&waiter->list);
+	wake_up_process(waiter->task);
 
-	tsk = waiter->task;
-	smp_mb();
-	waiter->task = NULL;
-	wake_up_process(tsk);
-	put_task_struct(tsk);
 	return sem;
 }
 
@@ -204,7 +189,6 @@ int __down_read_trylock(struct rw_semaphore *sem)
 
 /*
  * get a write lock on the semaphore
- * - we increment the waiting count anyway to indicate an exclusive lock
  */
 void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 {
@@ -214,37 +198,32 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
-	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
-		/* granted */
-		sem->activity = -1;
-		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
-		goto out;
-	}
-
-	tsk = current;
-	set_task_state(tsk, TASK_UNINTERRUPTIBLE);
-
 	/* set up my own style of waitqueue */
+	tsk = current;
 	waiter.task = tsk;
 	waiter.flags = RWSEM_WAITING_FOR_WRITE;
-	get_task_struct(tsk);
-
 	list_add_tail(&waiter.list, &sem->wait_list);
 
-	/* we don't need to touch the semaphore struct anymore */
-	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
-
-	/* wait to be given the lock */
+	/* wait for someone to release the lock */
 	for (;;) {
-		if (!waiter.task)
+		/*
+		 * That is the key to support write lock stealing: allows the
+		 * task already on CPU to get the lock soon rather than put
+		 * itself into sleep and waiting for system woke it or someone
+		 * else in the head of the wait list up.
+		 */
+		if (sem->activity == 0)
 			break;
-		schedule();
 		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+		schedule();
+		raw_spin_lock_irqsave(&sem->wait_lock, flags);
 	}
+	/* got the lock */
+	sem->activity = -1;
+	list_del(&waiter.list);
 
-	tsk->state = TASK_RUNNING;
- out:
-	;
+	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 }
 
 void __sched __down_write(struct rw_semaphore *sem)
@@ -262,8 +241,8 @@ int __down_write_trylock(struct rw_semaphore *sem)
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
-	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
-		/* granted */
+	if (sem->activity == 0) {
+		/* got the lock */
 		sem->activity = -1;
 		ret = 1;
 	}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] rwsem-spinlock: let rwsem write lock stealable
  2013-02-01 10:59 [PATCH v2] rwsem-spinlock: let rwsem write lock stealable Yuanhan Liu
@ 2013-02-16  9:08 ` Yuanhan Liu
  2013-02-18 16:25 ` [tip:core/locking] rwsem-spinlock: Implement writer lock-stealing for better scalability tip-bot for Yuanhan Liu
  2013-02-22 12:37 ` tip-bot for Yuanhan Liu
  2 siblings, 0 replies; 4+ messages in thread
From: Yuanhan Liu @ 2013-02-16  9:08 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel, David Howells, Michel Lespinasse

Hi Ingo, 

Ping...

On Fri, Feb 01, 2013 at 06:59:16PM +0800, Yuanhan Liu wrote:
> We(Linux Kernel Performance project) found a regression introduced by
> commit 5a50508, which just convert all mutex lock to rwsem write lock.
> The semantics is same, but the results is quite huge in some cases.
> After investigation, we found the root cause: mutex support lock
> stealing. Here is the link for the detailed regression report:
>     https://lkml.org/lkml/2013/1/29/84
> 
> Ingo suggests to add write lock stealing to rwsem as well:
>     "I think we should allow lock-steal between rwsem writers - that
>      will not hurt fairness as most rwsem fairness concerns relate to
>      reader vs. writer fairness"
> 
> And here is the rwsem-spinlock version.
> 
> With this patch, we got a double performance increase in one test box
> with following aim7 workfile:
>     FILESIZE: 1M
>     POOLSIZE: 10M
>     10 fork_test
> 
> some /usr/bin/time output w/o patch      some /usr/bin/time_output with patch
> ----------------------------------------------------------------------------
> Percent of CPU this job got: 369%        Percent of CPU this job got: 537%
> Voluntary context switches: 640595016    Voluntary context switches: 157915561
> ----------------------------------------------------------------------------
> You will see we got a 45% increase of CPU usage and saves about 3/4
> voluntary context switches.
> 
> Here is the .nr_running filed for all CPUs from /proc/sched_debug.
> 
> output w/o this patch:
> ----------------------
> cpu 00:   0   0   ...   0   0   0   0   0   0   0   1   0   1 .... 0   0
> cpu 01:   0   0   ...   1   0   0   0   0   0   1   1   0   1 .... 0   0
> cpu 02:   0   0   ...   1   1   0   0   0   1   0   0   1   0 .... 1   1
> cpu 03:   0   0   ...   0   1   0   0   0   1   1   0   1   1 .... 0   0
> cpu 04:   0   1   ...   0   0   2   1   1   2   1   0   1   0 .... 1   0
> cpu 05:   0   1   ...   0   0   2   1   1   2   1   1   1   1 .... 0   0
> cpu 06:   0   0   ...   2   0   0   1   0   0   1   0   0   0 .... 0   0
> cpu 07:   0   0   ...   2   0   0   0   1   0   1   1   0   0 .... 1   0
> cpu 08:   0   0   ...   1   0   0   0   1   0   0   1   0   0 .... 0   1
> cpu 09:   0   0   ...   1   0   0   0   1   0   0   1   0   0 .... 0   1
> cpu 10:   0   0   ...   0   0   0   2   0   0   1   0   1   1 .... 1   2
> cpu 11:   0   0   ...   0   0   0   2   2   0   1   0   1   0 .... 1   2
> cpu 12:   0   0   ...   2   0   0   0   1   1   3   1   1   1 .... 1   0
> cpu 13:   0   0   ...   2   0   0   0   1   1   3   1   1   0 .... 1   1
> cpu 14:   0   0   ...   0   0   0   2   0   0   1   1   0   0 .... 1   0
> cpu 15:   0   0   ...   1   0   0   2   0   0   1   1   0   0 .... 0   0
> 
> output with this patch:
> -----------------------
> cpu 00:   0   0   ...   1   1   2   1   1   1   2   1   1   1 .... 1   3
> cpu 01:   0   0   ...   1   1   1   1   1   1   2   1   1   1 .... 1   3
> cpu 02:   0   0   ...   2   2   3   2   0   2   1   2   1   1 .... 1   1
> cpu 03:   0   0   ...   2   2   3   2   1   2   1   2   1   1 .... 1   1
> cpu 04:   0   1   ...   2   0   0   1   0   1   3   1   1   1 .... 1   1
> cpu 05:   0   1   ...   2   0   1   1   0   1   2   1   1   1 .... 1   1
> cpu 06:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
> cpu 07:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
> cpu 08:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
> cpu 09:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
> cpu 10:   0   0   ...   1   1   1   0   0   1   1   1   1   1 .... 0   0
> cpu 11:   0   0   ...   1   1   1   0   0   1   1   1   1   2 .... 1   0
> cpu 12:   0   0   ...   1   1   1   0   1   1   0   0   0   1 .... 2   1
> cpu 13:   0   0   ...   1   1   1   0   1   1   1   0   1   2 .... 2   0
> cpu 14:   0   0   ...   2   0   0   0   0   1   1   1   1   1 .... 2   2
> cpu 15:   0   0   ...   2   0   0   1   0   1   1   1   1   1 .... 2   2
> ------------------------------------------------------------------------
> Where you can see that CPU is much busier with this patch.
> 
> v2: make it stealable at __down_write_trylock as well, pointed by Michel
> 
> Reported-by: LKP project <lkp@linux.intel.com>
> Suggested-by: Ingo Molnar <mingo@kernel.org>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Michel Lespinasse <walken@google.com>
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> ---
>  lib/rwsem-spinlock.c |   69 +++++++++++++++++--------------------------------
>  1 files changed, 24 insertions(+), 45 deletions(-)
> 
> diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
> index 7e0d6a5..7542afb 100644
> --- a/lib/rwsem-spinlock.c
> +++ b/lib/rwsem-spinlock.c
> @@ -73,20 +73,13 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
>  		goto dont_wake_writers;
>  	}
>  
> -	/* if we are allowed to wake writers try to grant a single write lock
> -	 * if there's a writer at the front of the queue
> -	 * - we leave the 'waiting count' incremented to signify potential
> -	 *   contention
> +	/*
> +	 * as we support write lock stealing, we can't set sem->activity
> +	 * to -1 here to indicate we get the lock. Instead, we wake it up
> +	 * to let it go get it again.
>  	 */
>  	if (waiter->flags & RWSEM_WAITING_FOR_WRITE) {
> -		sem->activity = -1;
> -		list_del(&waiter->list);
> -		tsk = waiter->task;
> -		/* Don't touch waiter after ->task has been NULLed */
> -		smp_mb();
> -		waiter->task = NULL;
> -		wake_up_process(tsk);
> -		put_task_struct(tsk);
> +		wake_up_process(waiter->task);
>  		goto out;
>  	}
>  
> @@ -121,18 +114,10 @@ static inline struct rw_semaphore *
>  __rwsem_wake_one_writer(struct rw_semaphore *sem)
>  {
>  	struct rwsem_waiter *waiter;
> -	struct task_struct *tsk;
> -
> -	sem->activity = -1;
>  
>  	waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
> -	list_del(&waiter->list);
> +	wake_up_process(waiter->task);
>  
> -	tsk = waiter->task;
> -	smp_mb();
> -	waiter->task = NULL;
> -	wake_up_process(tsk);
> -	put_task_struct(tsk);
>  	return sem;
>  }
>  
> @@ -204,7 +189,6 @@ int __down_read_trylock(struct rw_semaphore *sem)
>  
>  /*
>   * get a write lock on the semaphore
> - * - we increment the waiting count anyway to indicate an exclusive lock
>   */
>  void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
>  {
> @@ -214,37 +198,32 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
>  
>  	raw_spin_lock_irqsave(&sem->wait_lock, flags);
>  
> -	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
> -		/* granted */
> -		sem->activity = -1;
> -		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
> -		goto out;
> -	}
> -
> -	tsk = current;
> -	set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> -
>  	/* set up my own style of waitqueue */
> +	tsk = current;
>  	waiter.task = tsk;
>  	waiter.flags = RWSEM_WAITING_FOR_WRITE;
> -	get_task_struct(tsk);
> -
>  	list_add_tail(&waiter.list, &sem->wait_list);
>  
> -	/* we don't need to touch the semaphore struct anymore */
> -	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
> -
> -	/* wait to be given the lock */
> +	/* wait for someone to release the lock */
>  	for (;;) {
> -		if (!waiter.task)
> +		/*
> +		 * That is the key to support write lock stealing: allows the
> +		 * task already on CPU to get the lock soon rather than put
> +		 * itself into sleep and waiting for system woke it or someone
> +		 * else in the head of the wait list up.
> +		 */
> +		if (sem->activity == 0)
>  			break;
> -		schedule();
>  		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> +		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
> +		schedule();
> +		raw_spin_lock_irqsave(&sem->wait_lock, flags);
>  	}
> +	/* got the lock */
> +	sem->activity = -1;
> +	list_del(&waiter.list);
>  
> -	tsk->state = TASK_RUNNING;
> - out:
> -	;
> +	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
>  }
>  
>  void __sched __down_write(struct rw_semaphore *sem)
> @@ -262,8 +241,8 @@ int __down_write_trylock(struct rw_semaphore *sem)
>  
>  	raw_spin_lock_irqsave(&sem->wait_lock, flags);
>  
> -	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
> -		/* granted */
> +	if (sem->activity == 0) {
> +		/* got the lock */
>  		sem->activity = -1;
>  		ret = 1;
>  	}
> -- 
> 1.7.7.6

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip:core/locking] rwsem-spinlock: Implement writer lock-stealing for better scalability
  2013-02-01 10:59 [PATCH v2] rwsem-spinlock: let rwsem write lock stealable Yuanhan Liu
  2013-02-16  9:08 ` Yuanhan Liu
@ 2013-02-18 16:25 ` tip-bot for Yuanhan Liu
  2013-02-22 12:37 ` tip-bot for Yuanhan Liu
  2 siblings, 0 replies; 4+ messages in thread
From: tip-bot for Yuanhan Liu @ 2013-02-18 16:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, anton, hpa, mingo, arjan, a.p.zijlstra, torvalds,
	alex.shi, yuanhan.liu, dhowells, akpm, tglx, walken, lkp

Commit-ID:  5dae63c442131f1b0a66abd43fdc861031f13ca6
Gitweb:     http://git.kernel.org/tip/5dae63c442131f1b0a66abd43fdc861031f13ca6
Author:     Yuanhan Liu <yuanhan.liu@linux.intel.com>
AuthorDate: Fri, 1 Feb 2013 18:59:16 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 18 Feb 2013 10:10:21 +0100

rwsem-spinlock: Implement writer lock-stealing for better scalability

We (Linux Kernel Performance project) found a regression
introduced by commit:

  5a505085f043 mm/rmap: Convert the struct anon_vma::mutex to an rwsem

which converted all anon_vma::mutex locks rwsem write locks.

The semantics are the same, but the behavioral difference is
quite huge in some cases. After investigating it we found the
root cause: mutexes support lock stealing while rwsems don't.

Here is the link for the detailed regression report:

  https://lkml.org/lkml/2013/1/29/84

Ingo suggested adding write lock stealing to rwsems:

    "I think we should allow lock-steal between rwsem writers - that
     will not hurt fairness as most rwsem fairness concerns relate to
     reader vs. writer fairness"

And here is the rwsem-spinlock version.

With this patch, we got a double performance increase in one
test box with following aim7 workfile:

    FILESIZE: 1M
    POOLSIZE: 10M
    10 fork_test

 /usr/bin/time output w/o patch                       /usr/bin/time_output with patch
 -- Percent of CPU this job got: 369%                 Percent of CPU this job got: 537%
 Voluntary context switches: 640595016                Voluntary context switches: 157915561

We got a 45% increase in CPU usage and saved about 3/4 voluntary context switches.

Reported-by: LKP project <lkp@linux.intel.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Alex Shi <alex.shi@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/1359716356-23865-1-git-send-email-yuanhan.liu@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 lib/rwsem-spinlock.c | 69 ++++++++++++++++++----------------------------------
 1 file changed, 24 insertions(+), 45 deletions(-)

diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
index 7e0d6a5..7542afb 100644
--- a/lib/rwsem-spinlock.c
+++ b/lib/rwsem-spinlock.c
@@ -73,20 +73,13 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
 		goto dont_wake_writers;
 	}
 
-	/* if we are allowed to wake writers try to grant a single write lock
-	 * if there's a writer at the front of the queue
-	 * - we leave the 'waiting count' incremented to signify potential
-	 *   contention
+	/*
+	 * as we support write lock stealing, we can't set sem->activity
+	 * to -1 here to indicate we get the lock. Instead, we wake it up
+	 * to let it go get it again.
 	 */
 	if (waiter->flags & RWSEM_WAITING_FOR_WRITE) {
-		sem->activity = -1;
-		list_del(&waiter->list);
-		tsk = waiter->task;
-		/* Don't touch waiter after ->task has been NULLed */
-		smp_mb();
-		waiter->task = NULL;
-		wake_up_process(tsk);
-		put_task_struct(tsk);
+		wake_up_process(waiter->task);
 		goto out;
 	}
 
@@ -121,18 +114,10 @@ static inline struct rw_semaphore *
 __rwsem_wake_one_writer(struct rw_semaphore *sem)
 {
 	struct rwsem_waiter *waiter;
-	struct task_struct *tsk;
-
-	sem->activity = -1;
 
 	waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
-	list_del(&waiter->list);
+	wake_up_process(waiter->task);
 
-	tsk = waiter->task;
-	smp_mb();
-	waiter->task = NULL;
-	wake_up_process(tsk);
-	put_task_struct(tsk);
 	return sem;
 }
 
@@ -204,7 +189,6 @@ int __down_read_trylock(struct rw_semaphore *sem)
 
 /*
  * get a write lock on the semaphore
- * - we increment the waiting count anyway to indicate an exclusive lock
  */
 void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 {
@@ -214,37 +198,32 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
-	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
-		/* granted */
-		sem->activity = -1;
-		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
-		goto out;
-	}
-
-	tsk = current;
-	set_task_state(tsk, TASK_UNINTERRUPTIBLE);
-
 	/* set up my own style of waitqueue */
+	tsk = current;
 	waiter.task = tsk;
 	waiter.flags = RWSEM_WAITING_FOR_WRITE;
-	get_task_struct(tsk);
-
 	list_add_tail(&waiter.list, &sem->wait_list);
 
-	/* we don't need to touch the semaphore struct anymore */
-	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
-
-	/* wait to be given the lock */
+	/* wait for someone to release the lock */
 	for (;;) {
-		if (!waiter.task)
+		/*
+		 * That is the key to support write lock stealing: allows the
+		 * task already on CPU to get the lock soon rather than put
+		 * itself into sleep and waiting for system woke it or someone
+		 * else in the head of the wait list up.
+		 */
+		if (sem->activity == 0)
 			break;
-		schedule();
 		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+		schedule();
+		raw_spin_lock_irqsave(&sem->wait_lock, flags);
 	}
+	/* got the lock */
+	sem->activity = -1;
+	list_del(&waiter.list);
 
-	tsk->state = TASK_RUNNING;
- out:
-	;
+	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 }
 
 void __sched __down_write(struct rw_semaphore *sem)
@@ -262,8 +241,8 @@ int __down_write_trylock(struct rw_semaphore *sem)
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
-	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
-		/* granted */
+	if (sem->activity == 0) {
+		/* got the lock */
 		sem->activity = -1;
 		ret = 1;
 	}

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [tip:core/locking] rwsem-spinlock: Implement writer lock-stealing for better scalability
  2013-02-01 10:59 [PATCH v2] rwsem-spinlock: let rwsem write lock stealable Yuanhan Liu
  2013-02-16  9:08 ` Yuanhan Liu
  2013-02-18 16:25 ` [tip:core/locking] rwsem-spinlock: Implement writer lock-stealing for better scalability tip-bot for Yuanhan Liu
@ 2013-02-22 12:37 ` tip-bot for Yuanhan Liu
  2 siblings, 0 replies; 4+ messages in thread
From: tip-bot for Yuanhan Liu @ 2013-02-22 12:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, anton, hpa, mingo, arjan, a.p.zijlstra, torvalds,
	alex.shi, yuanhan.liu, dhowells, akpm, tglx, walken, lkp

Commit-ID:  41ef8f826692c8f65882bec0a8211bd4d1d2d19a
Gitweb:     http://git.kernel.org/tip/41ef8f826692c8f65882bec0a8211bd4d1d2d19a
Author:     Yuanhan Liu <yuanhan.liu@linux.intel.com>
AuthorDate: Fri, 1 Feb 2013 18:59:16 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 19 Feb 2013 08:43:39 +0100

rwsem-spinlock: Implement writer lock-stealing for better scalability

We (Linux Kernel Performance project) found a regression
introduced by commit:

  5a505085f043 mm/rmap: Convert the struct anon_vma::mutex to an rwsem

which converted all anon_vma::mutex locks rwsem write locks.

The semantics are the same, but the behavioral difference is
quite huge in some cases. After investigating it we found the
root cause: mutexes support lock stealing while rwsems don't.

Here is the link for the detailed regression report:

  https://lkml.org/lkml/2013/1/29/84

Ingo suggested adding write lock stealing to rwsems:

    "I think we should allow lock-steal between rwsem writers - that
     will not hurt fairness as most rwsem fairness concerns relate to
     reader vs. writer fairness"

And here is the rwsem-spinlock version.

With this patch, we got a double performance increase in one
test box with following aim7 workfile:

    FILESIZE: 1M
    POOLSIZE: 10M
    10 fork_test

 /usr/bin/time output w/o patch                       /usr/bin/time_output with patch
 -- Percent of CPU this job got: 369%                 Percent of CPU this job got: 537%
 Voluntary context switches: 640595016                Voluntary context switches: 157915561

We got a 45% increase in CPU usage and saved about 3/4 voluntary context switches.

Reported-by: LKP project <lkp@linux.intel.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Alex Shi <alex.shi@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/1359716356-23865-1-git-send-email-yuanhan.liu@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 lib/rwsem-spinlock.c | 69 ++++++++++++++++++----------------------------------
 1 file changed, 24 insertions(+), 45 deletions(-)

diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
index 7e0d6a5..7542afb 100644
--- a/lib/rwsem-spinlock.c
+++ b/lib/rwsem-spinlock.c
@@ -73,20 +73,13 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
 		goto dont_wake_writers;
 	}
 
-	/* if we are allowed to wake writers try to grant a single write lock
-	 * if there's a writer at the front of the queue
-	 * - we leave the 'waiting count' incremented to signify potential
-	 *   contention
+	/*
+	 * as we support write lock stealing, we can't set sem->activity
+	 * to -1 here to indicate we get the lock. Instead, we wake it up
+	 * to let it go get it again.
 	 */
 	if (waiter->flags & RWSEM_WAITING_FOR_WRITE) {
-		sem->activity = -1;
-		list_del(&waiter->list);
-		tsk = waiter->task;
-		/* Don't touch waiter after ->task has been NULLed */
-		smp_mb();
-		waiter->task = NULL;
-		wake_up_process(tsk);
-		put_task_struct(tsk);
+		wake_up_process(waiter->task);
 		goto out;
 	}
 
@@ -121,18 +114,10 @@ static inline struct rw_semaphore *
 __rwsem_wake_one_writer(struct rw_semaphore *sem)
 {
 	struct rwsem_waiter *waiter;
-	struct task_struct *tsk;
-
-	sem->activity = -1;
 
 	waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
-	list_del(&waiter->list);
+	wake_up_process(waiter->task);
 
-	tsk = waiter->task;
-	smp_mb();
-	waiter->task = NULL;
-	wake_up_process(tsk);
-	put_task_struct(tsk);
 	return sem;
 }
 
@@ -204,7 +189,6 @@ int __down_read_trylock(struct rw_semaphore *sem)
 
 /*
  * get a write lock on the semaphore
- * - we increment the waiting count anyway to indicate an exclusive lock
  */
 void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 {
@@ -214,37 +198,32 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
-	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
-		/* granted */
-		sem->activity = -1;
-		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
-		goto out;
-	}
-
-	tsk = current;
-	set_task_state(tsk, TASK_UNINTERRUPTIBLE);
-
 	/* set up my own style of waitqueue */
+	tsk = current;
 	waiter.task = tsk;
 	waiter.flags = RWSEM_WAITING_FOR_WRITE;
-	get_task_struct(tsk);
-
 	list_add_tail(&waiter.list, &sem->wait_list);
 
-	/* we don't need to touch the semaphore struct anymore */
-	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
-
-	/* wait to be given the lock */
+	/* wait for someone to release the lock */
 	for (;;) {
-		if (!waiter.task)
+		/*
+		 * That is the key to support write lock stealing: allows the
+		 * task already on CPU to get the lock soon rather than put
+		 * itself into sleep and waiting for system woke it or someone
+		 * else in the head of the wait list up.
+		 */
+		if (sem->activity == 0)
 			break;
-		schedule();
 		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+		schedule();
+		raw_spin_lock_irqsave(&sem->wait_lock, flags);
 	}
+	/* got the lock */
+	sem->activity = -1;
+	list_del(&waiter.list);
 
-	tsk->state = TASK_RUNNING;
- out:
-	;
+	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 }
 
 void __sched __down_write(struct rw_semaphore *sem)
@@ -262,8 +241,8 @@ int __down_write_trylock(struct rw_semaphore *sem)
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
-	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
-		/* granted */
+	if (sem->activity == 0) {
+		/* got the lock */
 		sem->activity = -1;
 		ret = 1;
 	}

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-02-22 12:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-01 10:59 [PATCH v2] rwsem-spinlock: let rwsem write lock stealable Yuanhan Liu
2013-02-16  9:08 ` Yuanhan Liu
2013-02-18 16:25 ` [tip:core/locking] rwsem-spinlock: Implement writer lock-stealing for better scalability tip-bot for Yuanhan Liu
2013-02-22 12:37 ` tip-bot for Yuanhan Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.