linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/11] introduce down_write_killable for rw_semaphore
@ 2016-02-29 12:58 Michal Hocko
  2016-02-29 12:58 ` [PATCH 01/11] locking, rwsem: get rid of __down_write_nested Michal Hocko
                   ` (11 more replies)
  0 siblings, 12 replies; 52+ messages in thread
From: Michal Hocko @ 2016-02-29 12:58 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

Hi,
I have posted this series [1] and there didn't seem to be any
fundamental objections except for x86 maintainers who preferred the asm
__down_write implementation which I have reworked for this series. Apart
from that I have refreshed on top of the current linux-next and fixed
one unused variable warning which happened to trigger a compilation
failure on ppc.

The following patchset implements a killable variant of write lock for
rw_semaphore. My usecase is to turn as many mmap_sem write users to use
a killable variant which will be helpful for the oom_reaper [2] to
asynchronously tear down the oom victim address space which requires
mmap_sem for read. This will reduce a likelihood of OOM livelocks caused
by oom victim being stuck on a lock or other resource which prevents it
to reach its exit path and release the memory. I haven't implemented
the killable variant of the read lock because I do not have any usecase
for this API.

The patchset is organized as follows.
- Patch 1 is a trivial cleanup
- Patch 2, I belive, shouldn't introduce any functional changes as per
  Documentation/memory-barriers.txt.
- Patch 3 is the preparatory work and necessary infrastructure for
  down_write_killable. It implements generic __down_write_killable
  and prepares the write lock slow path to bail out earlier when told so
- Patch 4-1- are implementing arch specific __down_write_killable. One
  patch per architecture.
- finally patch 11 implements down_write_killable and ties everything
  together. I am not really an expert on lockdep so I hope I got it right.

Many of arch specific patches are basically same and I can squash them
into one patch if this is preferred but I thought that one patch per
arch is preferable.

My patch to change mmap_sem write users to killable form is not part of
the series because it is a larger change and I would like to prevent
from spamming people too much. I will post the series shortly.

I have tested on x86 with OOM situations with high mmap_sem contention
(basically many parallel page faults racing with many parallel mmap/munmap
tight loops) so the waiters for the write locks are routinely interrupted
by SIGKILL.

Patches should apply cleanly on both Linus and next tree.

[1] http://lkml.kernel.org/r/1454444369-2146-1-git-send-email-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1452094975-551-1-git-send-email-mhocko@kernel.org

Any feedback is highly appreciated.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 01/11] locking, rwsem: get rid of __down_write_nested
  2016-02-29 12:58 [PATCH 0/11] introduce down_write_killable for rw_semaphore Michal Hocko
@ 2016-02-29 12:58 ` Michal Hocko
  2016-02-29 12:58 ` [PATCH 02/11] locking, rwsem: drop explicit memory barriers Michal Hocko
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-02-29 12:58 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

This is no longer used anywhere and all callers (__down_write) use
0 as a subclass. Ditch __down_write_nested to make the code easier
to follow.

This shouldn't introduce any functional change.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/include/asm/rwsem.h   | 7 +------
 arch/sh/include/asm/rwsem.h     | 5 -----
 arch/sparc/include/asm/rwsem.h  | 7 +------
 arch/x86/include/asm/rwsem.h    | 7 +------
 include/asm-generic/rwsem.h     | 7 +------
 include/linux/rwsem-spinlock.h  | 1 -
 kernel/locking/rwsem-spinlock.c | 7 +------
 7 files changed, 5 insertions(+), 36 deletions(-)

diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
index 4b43ee7e6776..8e52e72f3efc 100644
--- a/arch/s390/include/asm/rwsem.h
+++ b/arch/s390/include/asm/rwsem.h
@@ -90,7 +90,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	signed long old, new, tmp;
 
@@ -108,11 +108,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		rwsem_down_write_failed(sem);
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
index edab57265293..a5104bebd1eb 100644
--- a/arch/sh/include/asm/rwsem.h
+++ b/arch/sh/include/asm/rwsem.h
@@ -114,11 +114,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 		rwsem_downgrade_wake(sem);
 }
 
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
-{
-	__down_write(sem);
-}
-
 /*
  * implement exchange and add functionality
  */
diff --git a/arch/sparc/include/asm/rwsem.h b/arch/sparc/include/asm/rwsem.h
index 069bf4d663a1..e5a0d575bc7f 100644
--- a/arch/sparc/include/asm/rwsem.h
+++ b/arch/sparc/include/asm/rwsem.h
@@ -45,7 +45,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	long tmp;
 
@@ -55,11 +55,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		rwsem_down_write_failed(sem);
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index ceec86eb68e9..4a8292a0d6e1 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -99,7 +99,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	long tmp;
 	asm volatile("# beginning down_write\n\t"
@@ -116,11 +116,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		     : "memory", "cc");
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index d6d5dc98d7da..b8d8a6cf4ca8 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -53,7 +53,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	long tmp;
 
@@ -63,11 +63,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		rwsem_down_write_failed(sem);
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
diff --git a/include/linux/rwsem-spinlock.h b/include/linux/rwsem-spinlock.h
index 561e8615528d..a733a5467e6c 100644
--- a/include/linux/rwsem-spinlock.h
+++ b/include/linux/rwsem-spinlock.h
@@ -34,7 +34,6 @@ struct rw_semaphore {
 extern void __down_read(struct rw_semaphore *sem);
 extern int __down_read_trylock(struct rw_semaphore *sem);
 extern void __down_write(struct rw_semaphore *sem);
-extern void __down_write_nested(struct rw_semaphore *sem, int subclass);
 extern int __down_write_trylock(struct rw_semaphore *sem);
 extern void __up_read(struct rw_semaphore *sem);
 extern void __up_write(struct rw_semaphore *sem);
diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index 3a5048572065..bab26104a5d0 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -191,7 +191,7 @@ int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * get a write lock on the semaphore
  */
-void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
+void __sched __down_write(struct rw_semaphore *sem)
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
@@ -227,11 +227,6 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 }
 
-void __sched __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 02/11] locking, rwsem: drop explicit memory barriers
  2016-02-29 12:58 [PATCH 0/11] introduce down_write_killable for rw_semaphore Michal Hocko
  2016-02-29 12:58 ` [PATCH 01/11] locking, rwsem: get rid of __down_write_nested Michal Hocko
@ 2016-02-29 12:58 ` Michal Hocko
  2016-02-29 12:58 ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Michal Hocko
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-02-29 12:58 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

sh and xtensa seem to be the only architectures which use explicit
memory barriers for rw_semaphore operations even though they are not
really needed because there is the full memory barrier is always implied
by atomic_{inc,dec,add,sub}_return resp. cmpxchg. Remove them.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/sh/include/asm/rwsem.h     | 14 ++------------
 arch/xtensa/include/asm/rwsem.h | 14 ++------------
 2 files changed, 4 insertions(+), 24 deletions(-)

diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
index a5104bebd1eb..f6c951c7a875 100644
--- a/arch/sh/include/asm/rwsem.h
+++ b/arch/sh/include/asm/rwsem.h
@@ -24,9 +24,7 @@
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	if (atomic_inc_return((atomic_t *)(&sem->count)) > 0)
-		smp_wmb();
-	else
+	if (atomic_inc_return((atomic_t *)(&sem->count)) <= 0)
 		rwsem_down_read_failed(sem);
 }
 
@@ -37,7 +35,6 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	while ((tmp = sem->count) >= 0) {
 		if (tmp == cmpxchg(&sem->count, tmp,
 				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
-			smp_wmb();
 			return 1;
 		}
 	}
@@ -53,9 +50,7 @@ static inline void __down_write(struct rw_semaphore *sem)
 
 	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
 				(atomic_t *)(&sem->count));
-	if (tmp == RWSEM_ACTIVE_WRITE_BIAS)
-		smp_wmb();
-	else
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
 		rwsem_down_write_failed(sem);
 }
 
@@ -65,7 +60,6 @@ static inline int __down_write_trylock(struct rw_semaphore *sem)
 
 	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
 		      RWSEM_ACTIVE_WRITE_BIAS);
-	smp_wmb();
 	return tmp == RWSEM_UNLOCKED_VALUE;
 }
 
@@ -76,7 +70,6 @@ static inline void __up_read(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_dec_return((atomic_t *)(&sem->count));
 	if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)
 		rwsem_wake(sem);
@@ -87,7 +80,6 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
-	smp_wmb();
 	if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
 			      (atomic_t *)(&sem->count)) < 0)
 		rwsem_wake(sem);
@@ -108,7 +100,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
 	if (tmp < 0)
 		rwsem_downgrade_wake(sem);
@@ -119,7 +110,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
  */
 static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
 {
-	smp_mb();
 	return atomic_add_return(delta, (atomic_t *)(&sem->count));
 }
 
diff --git a/arch/xtensa/include/asm/rwsem.h b/arch/xtensa/include/asm/rwsem.h
index 249619e7e7f2..593483f6e1ff 100644
--- a/arch/xtensa/include/asm/rwsem.h
+++ b/arch/xtensa/include/asm/rwsem.h
@@ -29,9 +29,7 @@
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	if (atomic_add_return(1,(atomic_t *)(&sem->count)) > 0)
-		smp_wmb();
-	else
+	if (atomic_add_return(1,(atomic_t *)(&sem->count)) <= 0)
 		rwsem_down_read_failed(sem);
 }
 
@@ -42,7 +40,6 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	while ((tmp = sem->count) >= 0) {
 		if (tmp == cmpxchg(&sem->count, tmp,
 				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
-			smp_wmb();
 			return 1;
 		}
 	}
@@ -58,9 +55,7 @@ static inline void __down_write(struct rw_semaphore *sem)
 
 	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
 				(atomic_t *)(&sem->count));
-	if (tmp == RWSEM_ACTIVE_WRITE_BIAS)
-		smp_wmb();
-	else
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
 		rwsem_down_write_failed(sem);
 }
 
@@ -70,7 +65,6 @@ static inline int __down_write_trylock(struct rw_semaphore *sem)
 
 	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
 		      RWSEM_ACTIVE_WRITE_BIAS);
-	smp_wmb();
 	return tmp == RWSEM_UNLOCKED_VALUE;
 }
 
@@ -81,7 +75,6 @@ static inline void __up_read(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_sub_return(1,(atomic_t *)(&sem->count));
 	if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)
 		rwsem_wake(sem);
@@ -92,7 +85,6 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
-	smp_wmb();
 	if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
 			      (atomic_t *)(&sem->count)) < 0)
 		rwsem_wake(sem);
@@ -113,7 +105,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
 	if (tmp < 0)
 		rwsem_downgrade_wake(sem);
@@ -124,7 +115,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
  */
 static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
 {
-	smp_mb();
 	return atomic_add_return(delta, (atomic_t *)(&sem->count));
 }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-02-29 12:58 [PATCH 0/11] introduce down_write_killable for rw_semaphore Michal Hocko
  2016-02-29 12:58 ` [PATCH 01/11] locking, rwsem: get rid of __down_write_nested Michal Hocko
  2016-02-29 12:58 ` [PATCH 02/11] locking, rwsem: drop explicit memory barriers Michal Hocko
@ 2016-02-29 12:58 ` Michal Hocko
  2016-03-30 13:25   ` Peter Zijlstra
  2016-03-31  8:55   ` [PATCH] " Michal Hocko
  2016-02-29 12:58 ` [PATCH 04/11] alpha, rwsem: provide __down_write_killable Michal Hocko
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 52+ messages in thread
From: Michal Hocko @ 2016-02-29 12:58 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce a generic implementation necessary for down_write_killable.
This is a trivial extension of the already existing down_write call
which can be interrupted by SIGKILL.  This patch doesn't provide
down_write_killable yet because arches have to provide the necessary
pieces before.

rwsem_down_write_failed which is a generic slow path for the
write lock is extended to allow a task state and renamed to
__rwsem_down_write_failed_state. The return value is either a valid
semaphore pointer or ERR_PTR(-EINTR).

rwsem_down_write_failed_killable is exported as a new way to wait for
the lock and be killable.

For rwsem-spinlock implementation the current __down_write it updated
in a similar way as __rwsem_down_write_failed_state except it doesn't
need new exports just visible __down_write_killable.

Architectures which are not using the generic rwsem implementation are
supposed to provide their __down_write_killable implementation and
use rwsem_down_write_failed_killable for the slow path.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/asm-generic/rwsem.h     | 12 ++++++++++++
 include/linux/rwsem-spinlock.h  |  1 +
 include/linux/rwsem.h           |  2 ++
 kernel/locking/rwsem-spinlock.c | 23 +++++++++++++++++++++--
 kernel/locking/rwsem-xadd.c     | 31 +++++++++++++++++++++++++------
 5 files changed, 61 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index b8d8a6cf4ca8..3fc94a046bf5 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -63,6 +63,18 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	long tmp;
+
+	tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS,
+				     (atomic_long_t *)&sem->count);
+	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
diff --git a/include/linux/rwsem-spinlock.h b/include/linux/rwsem-spinlock.h
index a733a5467e6c..ae0528b834cd 100644
--- a/include/linux/rwsem-spinlock.h
+++ b/include/linux/rwsem-spinlock.h
@@ -34,6 +34,7 @@ struct rw_semaphore {
 extern void __down_read(struct rw_semaphore *sem);
 extern int __down_read_trylock(struct rw_semaphore *sem);
 extern void __down_write(struct rw_semaphore *sem);
+extern int __must_check __down_write_killable(struct rw_semaphore *sem);
 extern int __down_write_trylock(struct rw_semaphore *sem);
 extern void __up_read(struct rw_semaphore *sem);
 extern void __up_write(struct rw_semaphore *sem);
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 8f498cdde280..7d7ae029dac5 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -14,6 +14,7 @@
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/atomic.h>
+#include <linux/err.h>
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 #include <linux/osq_lock.h>
 #endif
@@ -43,6 +44,7 @@ struct rw_semaphore {
 
 extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index bab26104a5d0..d1d04ca10d0e 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -191,11 +191,12 @@ int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * get a write lock on the semaphore
  */
-void __sched __down_write(struct rw_semaphore *sem)
+int __sched __down_write_state(struct rw_semaphore *sem, int state)
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
 	unsigned long flags;
+	int ret = 0;
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
@@ -215,16 +216,34 @@ void __sched __down_write(struct rw_semaphore *sem)
 		 */
 		if (sem->count == 0)
 			break;
-		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		set_task_state(tsk, state);
 		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 		schedule();
+		if (signal_pending_state(state, current)) {
+			ret = -EINTR;
+			raw_spin_lock_irqsave(&sem->wait_lock, flags);
+			goto out;
+		}
 		raw_spin_lock_irqsave(&sem->wait_lock, flags);
 	}
 	/* got the lock */
 	sem->count = -1;
+out:
 	list_del(&waiter.list);
 
 	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+
+	return ret;
+}
+
+void __sched __down_write(struct rw_semaphore *sem)
+{
+	__down_write_state(sem, TASK_UNINTERRUPTIBLE);
+}
+
+int __sched __down_write_killable(struct rw_semaphore *sem)
+{
+	return __down_write_state(sem, TASK_KILLABLE);
 }
 
 /*
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index a4d4de05b2d1..5cec34f1ad6f 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -433,12 +433,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 /*
  * Wait until we successfully acquire the write lock
  */
-__visible
-struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
+static inline struct rw_semaphore *
+__rwsem_down_write_failed_state(struct rw_semaphore *sem, int state)
 {
 	long count;
 	bool waiting = true; /* any queued threads before us */
 	struct rwsem_waiter waiter;
+	struct rw_semaphore *ret = sem;
 
 	/* undo write bias from down_write operation, stop active locking */
 	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
@@ -478,7 +479,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
 
 	/* wait until we successfully acquire the lock */
-	set_current_state(TASK_UNINTERRUPTIBLE);
+	set_current_state(state);
 	while (true) {
 		if (rwsem_try_write_lock(count, sem))
 			break;
@@ -487,20 +488,38 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 		/* Block until there are no active lockers. */
 		do {
 			schedule();
-			set_current_state(TASK_UNINTERRUPTIBLE);
+			if (signal_pending_state(state, current)) {
+				raw_spin_lock_irq(&sem->wait_lock);
+				ret = ERR_PTR(-EINTR);
+				goto out;
+			}
+			set_current_state(state);
 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
 	__set_current_state(TASK_RUNNING);
-
+out:
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
 
-	return sem;
+	return ret;
+}
+
+__visible struct rw_semaphore * __sched
+rwsem_down_write_failed(struct rw_semaphore *sem)
+{
+	return __rwsem_down_write_failed_state(sem, TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(rwsem_down_write_failed);
 
+__visible struct rw_semaphore * __sched
+rwsem_down_write_failed_killable(struct rw_semaphore *sem)
+{
+	return __rwsem_down_write_failed_state(sem, TASK_KILLABLE);
+}
+EXPORT_SYMBOL(rwsem_down_write_failed_killable);
+
 /*
  * handle waking up a waiter on the semaphore
  * - up_read/up_write has decremented the active part of count if we come here
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 04/11] alpha, rwsem: provide __down_write_killable
  2016-02-29 12:58 [PATCH 0/11] introduce down_write_killable for rw_semaphore Michal Hocko
                   ` (2 preceding siblings ...)
  2016-02-29 12:58 ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Michal Hocko
@ 2016-02-29 12:58 ` Michal Hocko
  2016-02-29 12:58 ` [PATCH 05/11] ia64, " Michal Hocko
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-02-29 12:58 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce ___down_write for the fast path and reuse it for __down_write
resp. __down_write_killable each using the respective generic slow path
(rwsem_down_write_failed resp. rwsem_down_write_failed_killable).

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/alpha/include/asm/rwsem.h | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h
index a83bbea62c67..0131a7058778 100644
--- a/arch/alpha/include/asm/rwsem.h
+++ b/arch/alpha/include/asm/rwsem.h
@@ -63,7 +63,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	return res >= 0 ? 1 : 0;
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
+static inline long ___down_write(struct rw_semaphore *sem)
 {
 	long oldcount;
 #ifndef	CONFIG_SMP
@@ -83,10 +83,24 @@ static inline void __down_write(struct rw_semaphore *sem)
 	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
 	:"Ir" (RWSEM_ACTIVE_WRITE_BIAS), "m" (sem->count) : "memory");
 #endif
-	if (unlikely(oldcount))
+	return oldcount;
+}
+
+static inline void __down_write(struct rw_semaphore *sem)
+{
+	if (unlikely(___down_write(sem)))
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (unlikely(___down_write(sem)))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 05/11] ia64, rwsem: provide __down_write_killable
  2016-02-29 12:58 [PATCH 0/11] introduce down_write_killable for rw_semaphore Michal Hocko
                   ` (3 preceding siblings ...)
  2016-02-29 12:58 ` [PATCH 04/11] alpha, rwsem: provide __down_write_killable Michal Hocko
@ 2016-02-29 12:58 ` Michal Hocko
  2016-02-29 12:58 ` [PATCH 06/11] s390, " Michal Hocko
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-02-29 12:58 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce ___down_write for the fast path and reuse it for __down_write
resp. __down_write_killable each using the respective generic slow path
(rwsem_down_write_failed resp. rwsem_down_write_failed_killable).

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/ia64/include/asm/rwsem.h | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/ia64/include/asm/rwsem.h b/arch/ia64/include/asm/rwsem.h
index 3027e7516d85..5e78cb40d9df 100644
--- a/arch/ia64/include/asm/rwsem.h
+++ b/arch/ia64/include/asm/rwsem.h
@@ -49,8 +49,8 @@ __down_read (struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void
-__down_write (struct rw_semaphore *sem)
+static inline long
+___down_write (struct rw_semaphore *sem)
 {
 	long old, new;
 
@@ -59,10 +59,26 @@ __down_write (struct rw_semaphore *sem)
 		new = old + RWSEM_ACTIVE_WRITE_BIAS;
 	} while (cmpxchg_acq(&sem->count, old, new) != old);
 
-	if (old != 0)
+	return old;
+}
+
+static inline void
+__down_write (struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
 		rwsem_down_write_failed(sem);
 }
 
+static inline int
+__down_write_killable (struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 /*
  * unlock after reading
  */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 06/11] s390, rwsem: provide __down_write_killable
  2016-02-29 12:58 [PATCH 0/11] introduce down_write_killable for rw_semaphore Michal Hocko
                   ` (4 preceding siblings ...)
  2016-02-29 12:58 ` [PATCH 05/11] ia64, " Michal Hocko
@ 2016-02-29 12:58 ` Michal Hocko
  2016-02-29 12:58 ` [PATCH 07/11] sh, " Michal Hocko
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-02-29 12:58 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce ___down_write for the fast path and reuse it for __down_write
resp. __down_write_killable each using the respective generic slow path
(rwsem_down_write_failed resp. rwsem_down_write_failed_killable).

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/include/asm/rwsem.h | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
index 8e52e72f3efc..8e7b2b7e10f3 100644
--- a/arch/s390/include/asm/rwsem.h
+++ b/arch/s390/include/asm/rwsem.h
@@ -90,7 +90,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write(struct rw_semaphore *sem)
+static inline long ___down_write(struct rw_semaphore *sem)
 {
 	signed long old, new, tmp;
 
@@ -104,10 +104,25 @@ static inline void __down_write(struct rw_semaphore *sem)
 		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
 		: "Q" (sem->count), "m" (tmp)
 		: "cc", "memory");
-	if (old != 0)
+
+	return old;
+}
+
+static inline void __down_write(struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 07/11] sh, rwsem: provide __down_write_killable
  2016-02-29 12:58 [PATCH 0/11] introduce down_write_killable for rw_semaphore Michal Hocko
                   ` (5 preceding siblings ...)
  2016-02-29 12:58 ` [PATCH 06/11] s390, " Michal Hocko
@ 2016-02-29 12:58 ` Michal Hocko
  2016-02-29 12:58 ` [PATCH 08/11] sparc, " Michal Hocko
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-02-29 12:58 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which uses the same fast path as __down_write except it falls back to
rwsem_down_write_failed_killable slow path and return -EINTR if killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/sh/include/asm/rwsem.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
index f6c951c7a875..8a457b83d2a5 100644
--- a/arch/sh/include/asm/rwsem.h
+++ b/arch/sh/include/asm/rwsem.h
@@ -54,6 +54,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				(atomic_t *)(&sem->count));
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	int tmp;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 08/11] sparc, rwsem: provide __down_write_killable
  2016-02-29 12:58 [PATCH 0/11] introduce down_write_killable for rw_semaphore Michal Hocko
                   ` (6 preceding siblings ...)
  2016-02-29 12:58 ` [PATCH 07/11] sh, " Michal Hocko
@ 2016-02-29 12:58 ` Michal Hocko
  2016-02-29 12:58 ` [PATCH 09/11] xtensa, " Michal Hocko
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-02-29 12:58 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which uses the same fast path as __down_write except it falls back to
rwsem_down_write_failed_killable slow path and return -EINTR if killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/sparc/include/asm/rwsem.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/sparc/include/asm/rwsem.h b/arch/sparc/include/asm/rwsem.h
index e5a0d575bc7f..c1e9e32e8b63 100644
--- a/arch/sparc/include/asm/rwsem.h
+++ b/arch/sparc/include/asm/rwsem.h
@@ -55,6 +55,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	long tmp;
+
+	tmp = atomic64_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				  (atomic64_t *)(&sem->count));
+	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 09/11] xtensa, rwsem: provide __down_write_killable
  2016-02-29 12:58 [PATCH 0/11] introduce down_write_killable for rw_semaphore Michal Hocko
                   ` (7 preceding siblings ...)
  2016-02-29 12:58 ` [PATCH 08/11] sparc, " Michal Hocko
@ 2016-02-29 12:58 ` Michal Hocko
  2016-02-29 12:58 ` [PATCH 10/11] x86, " Michal Hocko
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-02-29 12:58 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which uses the same fast path as __down_write except it falls back to
rwsem_down_write_failed_killable slow path and return -EINTR if killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/xtensa/include/asm/rwsem.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/xtensa/include/asm/rwsem.h b/arch/xtensa/include/asm/rwsem.h
index 593483f6e1ff..6283823b8040 100644
--- a/arch/xtensa/include/asm/rwsem.h
+++ b/arch/xtensa/include/asm/rwsem.h
@@ -59,6 +59,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				(atomic_t *)(&sem->count));
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	int tmp;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 10/11] x86, rwsem: provide __down_write_killable
  2016-02-29 12:58 [PATCH 0/11] introduce down_write_killable for rw_semaphore Michal Hocko
                   ` (8 preceding siblings ...)
  2016-02-29 12:58 ` [PATCH 09/11] xtensa, " Michal Hocko
@ 2016-02-29 12:58 ` Michal Hocko
  2016-02-29 12:58 ` [PATCH 11/11] locking, rwsem: provide down_write_killable Michal Hocko
  2016-03-30 13:32 ` [PATCH 0/11] introduce down_write_killable for rw_semaphore Peter Zijlstra
  11 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-02-29 12:58 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which uses the same fast path as __down_write except it falls back to
call_rwsem_down_write_failed_killable slow path and return -EINTR if
killed. To prevent from code duplication extract the skeleton of
__down_write into a helper macro which just takes the semaphore
and the slow path function to be called.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/include/asm/rwsem.h | 41 ++++++++++++++++++++++++++++-------------
 arch/x86/lib/rwsem.S         |  8 ++++++++
 2 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index 4a8292a0d6e1..d759c5f70f49 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -99,21 +99,36 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
+#define ____down_write(sem, slow_path)			\
+({							\
+	long tmp;					\
+	struct rw_semaphore* ret = sem;			\
+	asm volatile("# beginning down_write\n\t"	\
+		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"	\
+		     /* adds 0xffff0001, returns the old value */ \
+		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
+		     /* was the active mask 0 before? */\
+		     "  jz        1f\n"			\
+		     "  call " slow_path "\n"		\
+		     "1:\n"				\
+		     "# ending down_write"		\
+		     : "+m" (sem->count), "=d" (tmp), "+a" (ret)	\
+		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
+		     : "memory", "cc");			\
+	ret;						\
+})
+
 static inline void __down_write(struct rw_semaphore *sem)
 {
-	long tmp;
-	asm volatile("# beginning down_write\n\t"
-		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
-		     /* adds 0xffff0001, returns the old value */
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t"
-		     /* was the active mask 0 before? */
-		     "  jz        1f\n"
-		     "  call call_rwsem_down_write_failed\n"
-		     "1:\n"
-		     "# ending down_write"
-		     : "+m" (sem->count), "=d" (tmp)
-		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
+	____down_write(sem, "call_rwsem_down_write_failed");
+}
+
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (IS_ERR(____down_write(sem, "call_rwsem_down_write_failed_killable")))
+		return -EINTR;
+
+	return 0;
 }
 
 /*
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
index be110efa0096..4534a7e912f3 100644
--- a/arch/x86/lib/rwsem.S
+++ b/arch/x86/lib/rwsem.S
@@ -106,6 +106,14 @@ ENTRY(call_rwsem_down_write_failed)
 	ret
 ENDPROC(call_rwsem_down_write_failed)
 
+ENTRY(call_rwsem_down_write_failed_killable)
+	save_common_regs
+	movq %rax,%rdi
+	call rwsem_down_write_failed_killable
+	restore_common_regs
+	ret
+ENDPROC(call_rwsem_down_write_failed_killable)
+
 ENTRY(call_rwsem_wake)
 	FRAME_BEGIN
 	/* do nothing if still outstanding active readers */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 11/11] locking, rwsem: provide down_write_killable
  2016-02-29 12:58 [PATCH 0/11] introduce down_write_killable for rw_semaphore Michal Hocko
                   ` (9 preceding siblings ...)
  2016-02-29 12:58 ` [PATCH 10/11] x86, " Michal Hocko
@ 2016-02-29 12:58 ` Michal Hocko
  2016-03-30 13:32 ` [PATCH 0/11] introduce down_write_killable for rw_semaphore Peter Zijlstra
  11 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-02-29 12:58 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Now that all the architectures implement the necessary glue code
we can introduce down_write_killable. The only difference wrt. regular
down_write is that the slow path waits in TASK_KILLABLE state and the
interruption by the fatal signal is reported as -EINTR to the caller.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/lockdep.h | 15 +++++++++++++++
 include/linux/rwsem.h   |  1 +
 kernel/locking/rwsem.c  | 19 +++++++++++++++++++
 3 files changed, 35 insertions(+)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index d026b190c530..accfe56d8c51 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -444,6 +444,18 @@ do {								\
 	lock_acquired(&(_lock)->dep_map, _RET_IP_);			\
 } while (0)
 
+#define LOCK_CONTENDED_RETURN(_lock, try, lock)			\
+({								\
+	int ____err = 0;					\
+	if (!try(_lock)) {					\
+		lock_contended(&(_lock)->dep_map, _RET_IP_);	\
+		____err = lock(_lock);				\
+	}							\
+	if (!____err)						\
+		lock_acquired(&(_lock)->dep_map, _RET_IP_);	\
+	____err;						\
+})
+
 #else /* CONFIG_LOCK_STAT */
 
 #define lock_contended(lockdep_map, ip) do {} while (0)
@@ -452,6 +464,9 @@ do {								\
 #define LOCK_CONTENDED(_lock, try, lock) \
 	lock(_lock)
 
+#define LOCK_CONTENDED_RETURN(_lock, try, lock) \
+	lock(_lock)
+
 #endif /* CONFIG_LOCK_STAT */
 
 #ifdef CONFIG_LOCKDEP
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 7d7ae029dac5..d1c12d160ace 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -118,6 +118,7 @@ extern int down_read_trylock(struct rw_semaphore *sem);
  * lock for writing
  */
 extern void down_write(struct rw_semaphore *sem);
+extern int __must_check down_write_killable(struct rw_semaphore *sem);
 
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 205be0ce34de..c817216c1615 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -55,6 +55,25 @@ void __sched down_write(struct rw_semaphore *sem)
 EXPORT_SYMBOL(down_write);
 
 /*
+ * lock for writing
+ */
+int __sched down_write_killable(struct rw_semaphore *sem)
+{
+	might_sleep();
+	rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
+
+	if (LOCK_CONTENDED_RETURN(sem, __down_write_trylock, __down_write_killable)) {
+		rwsem_release(&sem->dep_map, 1, _RET_IP_);
+		return -EINTR;
+	}
+
+	rwsem_set_owner(sem);
+	return 0;
+}
+
+EXPORT_SYMBOL(down_write_killable);
+
+/*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
 int down_write_trylock(struct rw_semaphore *sem)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-02-29 12:58 ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Michal Hocko
@ 2016-03-30 13:25   ` Peter Zijlstra
  2016-03-31  8:33     ` Michal Hocko
  2016-03-31  8:55   ` [PATCH] " Michal Hocko
  1 sibling, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2016-03-30 13:25 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

On Mon, Feb 29, 2016 at 01:58:17PM +0100, Michal Hocko wrote:
> @@ -215,16 +216,34 @@ void __sched __down_write(struct rw_semaphore *sem)
>  		 */
>  		if (sem->count == 0)
>  			break;
> -		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> +		set_task_state(tsk, state);
>  		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
>  		schedule();
> +		if (signal_pending_state(state, current)) {
> +			ret = -EINTR;
> +			raw_spin_lock_irqsave(&sem->wait_lock, flags);
> +			goto out;
> +		}
>  		raw_spin_lock_irqsave(&sem->wait_lock, flags);
>  	}
>  	/* got the lock */
>  	sem->count = -1;

> @@ -487,20 +488,38 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
>  		/* Block until there are no active lockers. */
>  		do {
>  			schedule();
> -			set_current_state(TASK_UNINTERRUPTIBLE);
> +			if (signal_pending_state(state, current)) {
> +				raw_spin_lock_irq(&sem->wait_lock);
> +				ret = ERR_PTR(-EINTR);
> +				goto out;
> +			}
> +			set_current_state(state);
>  		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
>  
>  		raw_spin_lock_irq(&sem->wait_lock);
>  	}
>  	__set_current_state(TASK_RUNNING);

Why is the signal_pending_state() test _after_ the call to schedule()
and before the 'trylock'.

__mutex_lock_common() has it before the call to schedule and after the
'trylock'.

The difference is that rwsem will now respond to the KILL and return
-EINTR even if the lock is available, whereas mutex will acquire it and
ignore the signal (for a little while longer).

Neither is wrong per se, but I feel all the locking primitives should
behave in a consistent manner in this regard.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/11] introduce down_write_killable for rw_semaphore
  2016-02-29 12:58 [PATCH 0/11] introduce down_write_killable for rw_semaphore Michal Hocko
                   ` (10 preceding siblings ...)
  2016-02-29 12:58 ` [PATCH 11/11] locking, rwsem: provide down_write_killable Michal Hocko
@ 2016-03-30 13:32 ` Peter Zijlstra
  2016-03-31  8:59   ` Michal Hocko
  11 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2016-03-30 13:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

On Mon, Feb 29, 2016 at 01:58:14PM +0100, Michal Hocko wrote:
> I have tested on x86 with OOM situations with high mmap_sem contention
> (basically many parallel page faults racing with many parallel mmap/munmap
> tight loops) so the waiters for the write locks are routinely interrupted
> by SIGKILL.

Aside from the one niggle (as per the other email) they look good to me
and I would take them through the tip/locking tree.

Thanks!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-03-30 13:25   ` Peter Zijlstra
@ 2016-03-31  8:33     ` Michal Hocko
  2016-03-31  8:44       ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-03-31  8:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

On Wed 30-03-16 15:25:49, Peter Zijlstra wrote:
[...]
> Why is the signal_pending_state() test _after_ the call to schedule()
> and before the 'trylock'.

No special reason. I guess I was just too focused on the wake_by_signal
path and didn't realize the trylock as well.

> __mutex_lock_common() has it before the call to schedule and after the
> 'trylock'.
> 
> The difference is that rwsem will now respond to the KILL and return
> -EINTR even if the lock is available, whereas mutex will acquire it and
> ignore the signal (for a little while longer).
> 
> Neither is wrong per se, but I feel all the locking primitives should
> behave in a consistent manner in this regard.

Agreed! What about the following on top? I will repost the full patch
if it looks OK.

Thanks!
---
diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index d1d04ca10d0e..fb2db7b408f0 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -216,14 +216,13 @@ int __sched __down_write_state(struct rw_semaphore *sem, int state)
 		 */
 		if (sem->count == 0)
 			break;
-		set_task_state(tsk, state);
-		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
-		schedule();
 		if (signal_pending_state(state, current)) {
 			ret = -EINTR;
-			raw_spin_lock_irqsave(&sem->wait_lock, flags);
 			goto out;
 		}
+		set_task_state(tsk, state);
+		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+		schedule();
 		raw_spin_lock_irqsave(&sem->wait_lock, flags);
 	}
 	/* got the lock */
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 5cec34f1ad6f..781b2628e41b 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -487,19 +487,19 @@ __rwsem_down_write_failed_state(struct rw_semaphore *sem, int state)
 
 		/* Block until there are no active lockers. */
 		do {
-			schedule();
 			if (signal_pending_state(state, current)) {
 				raw_spin_lock_irq(&sem->wait_lock);
 				ret = ERR_PTR(-EINTR);
 				goto out;
 			}
+			schedule();
 			set_current_state(state);
 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
-	__set_current_state(TASK_RUNNING);
 out:
+	__set_current_state(TASK_RUNNING);
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-03-31  8:33     ` Michal Hocko
@ 2016-03-31  8:44       ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2016-03-31  8:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

On Thu, Mar 31, 2016 at 10:33:36AM +0200, Michal Hocko wrote:
> > __mutex_lock_common() has it before the call to schedule and after the
> > 'trylock'.
> > 
> > The difference is that rwsem will now respond to the KILL and return
> > -EINTR even if the lock is available, whereas mutex will acquire it and
> > ignore the signal (for a little while longer).
> > 
> > Neither is wrong per se, but I feel all the locking primitives should
> > behave in a consistent manner in this regard.
> 
> Agreed! What about the following on top? I will repost the full patch
> if it looks OK.

Yep, that seems to have the right shape to it.

Thanks!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] locking, rwsem: introduce basis for down_write_killable
  2016-02-29 12:58 ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Michal Hocko
  2016-03-30 13:25   ` Peter Zijlstra
@ 2016-03-31  8:55   ` Michal Hocko
  1 sibling, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-03-31  8:55 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce a generic implementation necessary for down_write_killable.
This is a trivial extension of the already existing down_write call
which can be interrupted by SIGKILL.  This patch doesn't provide
down_write_killable yet because arches have to provide the necessary
pieces before.

rwsem_down_write_failed which is a generic slow path for the
write lock is extended to allow a task state and renamed to
__rwsem_down_write_failed_state. The return value is either a valid
semaphore pointer or ERR_PTR(-EINTR).

rwsem_down_write_failed_killable is exported as a new way to wait for
the lock and be killable.

For rwsem-spinlock implementation the current __down_write it updated
in a similar way as __rwsem_down_write_failed_state except it doesn't
need new exports just visible __down_write_killable.

Architectures which are not using the generic rwsem implementation are
supposed to provide their __down_write_killable implementation and
use rwsem_down_write_failed_killable for the slow path.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/asm-generic/rwsem.h     | 12 ++++++++++++
 include/linux/rwsem-spinlock.h  |  1 +
 include/linux/rwsem.h           |  2 ++
 kernel/locking/rwsem-spinlock.c | 22 ++++++++++++++++++++--
 kernel/locking/rwsem-xadd.c     | 31 +++++++++++++++++++++++++------
 5 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index b8d8a6cf4ca8..3fc94a046bf5 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -63,6 +63,18 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	long tmp;
+
+	tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS,
+				     (atomic_long_t *)&sem->count);
+	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
diff --git a/include/linux/rwsem-spinlock.h b/include/linux/rwsem-spinlock.h
index a733a5467e6c..ae0528b834cd 100644
--- a/include/linux/rwsem-spinlock.h
+++ b/include/linux/rwsem-spinlock.h
@@ -34,6 +34,7 @@ struct rw_semaphore {
 extern void __down_read(struct rw_semaphore *sem);
 extern int __down_read_trylock(struct rw_semaphore *sem);
 extern void __down_write(struct rw_semaphore *sem);
+extern int __must_check __down_write_killable(struct rw_semaphore *sem);
 extern int __down_write_trylock(struct rw_semaphore *sem);
 extern void __up_read(struct rw_semaphore *sem);
 extern void __up_write(struct rw_semaphore *sem);
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 8f498cdde280..7d7ae029dac5 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -14,6 +14,7 @@
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/atomic.h>
+#include <linux/err.h>
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 #include <linux/osq_lock.h>
 #endif
@@ -43,6 +44,7 @@ struct rw_semaphore {
 
 extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index bab26104a5d0..fb2db7b408f0 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -191,11 +191,12 @@ int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * get a write lock on the semaphore
  */
-void __sched __down_write(struct rw_semaphore *sem)
+int __sched __down_write_state(struct rw_semaphore *sem, int state)
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
 	unsigned long flags;
+	int ret = 0;
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
@@ -215,16 +216,33 @@ void __sched __down_write(struct rw_semaphore *sem)
 		 */
 		if (sem->count == 0)
 			break;
-		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		if (signal_pending_state(state, current)) {
+			ret = -EINTR;
+			goto out;
+		}
+		set_task_state(tsk, state);
 		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 		schedule();
 		raw_spin_lock_irqsave(&sem->wait_lock, flags);
 	}
 	/* got the lock */
 	sem->count = -1;
+out:
 	list_del(&waiter.list);
 
 	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+
+	return ret;
+}
+
+void __sched __down_write(struct rw_semaphore *sem)
+{
+	__down_write_state(sem, TASK_UNINTERRUPTIBLE);
+}
+
+int __sched __down_write_killable(struct rw_semaphore *sem)
+{
+	return __down_write_state(sem, TASK_KILLABLE);
 }
 
 /*
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index a4d4de05b2d1..781b2628e41b 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -433,12 +433,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 /*
  * Wait until we successfully acquire the write lock
  */
-__visible
-struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
+static inline struct rw_semaphore *
+__rwsem_down_write_failed_state(struct rw_semaphore *sem, int state)
 {
 	long count;
 	bool waiting = true; /* any queued threads before us */
 	struct rwsem_waiter waiter;
+	struct rw_semaphore *ret = sem;
 
 	/* undo write bias from down_write operation, stop active locking */
 	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
@@ -478,7 +479,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
 
 	/* wait until we successfully acquire the lock */
-	set_current_state(TASK_UNINTERRUPTIBLE);
+	set_current_state(state);
 	while (true) {
 		if (rwsem_try_write_lock(count, sem))
 			break;
@@ -486,21 +487,39 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 
 		/* Block until there are no active lockers. */
 		do {
+			if (signal_pending_state(state, current)) {
+				raw_spin_lock_irq(&sem->wait_lock);
+				ret = ERR_PTR(-EINTR);
+				goto out;
+			}
 			schedule();
-			set_current_state(TASK_UNINTERRUPTIBLE);
+			set_current_state(state);
 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
+out:
 	__set_current_state(TASK_RUNNING);
-
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
 
-	return sem;
+	return ret;
+}
+
+__visible struct rw_semaphore * __sched
+rwsem_down_write_failed(struct rw_semaphore *sem)
+{
+	return __rwsem_down_write_failed_state(sem, TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(rwsem_down_write_failed);
 
+__visible struct rw_semaphore * __sched
+rwsem_down_write_failed_killable(struct rw_semaphore *sem)
+{
+	return __rwsem_down_write_failed_state(sem, TASK_KILLABLE);
+}
+EXPORT_SYMBOL(rwsem_down_write_failed_killable);
+
 /*
  * handle waking up a waiter on the semaphore
  * - up_read/up_write has decremented the active part of count if we come here
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/11] introduce down_write_killable for rw_semaphore
  2016-03-30 13:32 ` [PATCH 0/11] introduce down_write_killable for rw_semaphore Peter Zijlstra
@ 2016-03-31  8:59   ` Michal Hocko
  2016-03-31  9:20     ` Ingo Molnar
  0 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-03-31  8:59 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton
  Cc: LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Chris Zankel, Max Filippov, x86,
	linux-alpha, linux-ia64, linux-s390, linux-sh, sparclinux,
	linux-xtensa, linux-arch

On Wed 30-03-16 15:32:17, Peter Zijlstra wrote:
> On Mon, Feb 29, 2016 at 01:58:14PM +0100, Michal Hocko wrote:
> > I have tested on x86 with OOM situations with high mmap_sem contention
> > (basically many parallel page faults racing with many parallel mmap/munmap
> > tight loops) so the waiters for the write locks are routinely interrupted
> > by SIGKILL.
> 
> Aside from the one niggle (as per the other email) they look good to me
> and I would take them through the tip/locking tree.

Thanks for the review! I understand that tip/locking would be the most
appropriate place but I am wondering whether this causes some issues
with the follow up patches which use this new API and which I expect to
go via Andrew's tree.

That being said I do not care much but then we have a potential
dependency between mmotm and tip/locking.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/11] introduce down_write_killable for rw_semaphore
  2016-03-31  8:59   ` Michal Hocko
@ 2016-03-31  9:20     ` Ingo Molnar
  2016-03-31 10:58       ` Michal Hocko
  2016-03-31 17:03       ` Andrew Morton
  0 siblings, 2 replies; 52+ messages in thread
From: Ingo Molnar @ 2016-03-31  9:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Peter Zijlstra, Andrew Morton, LKML, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, David S. Miller, Tony Luck,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch


* Michal Hocko <mhocko@kernel.org> wrote:

> On Wed 30-03-16 15:32:17, Peter Zijlstra wrote:
> > On Mon, Feb 29, 2016 at 01:58:14PM +0100, Michal Hocko wrote:
> > > I have tested on x86 with OOM situations with high mmap_sem contention
> > > (basically many parallel page faults racing with many parallel mmap/munmap
> > > tight loops) so the waiters for the write locks are routinely interrupted
> > > by SIGKILL.
> > 
> > Aside from the one niggle (as per the other email) they look good to me
> > and I would take them through the tip/locking tree.
> 
> Thanks for the review! I understand that tip/locking would be the most 
> appropriate place [...]

Yes.

> [...] but I am wondering whether this causes some issues with the follow up 
> patches which use this new API and which I expect to go via Andrew's tree.

So AFAIK Andrew's tree is based on top of linux-next, so once it goes into 
tip:locking/core, -mm can pick it up as well 1-2 days later.

Please send the changes in isolation, for merge into the locking tree.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/11] introduce down_write_killable for rw_semaphore
  2016-03-31  9:20     ` Ingo Molnar
@ 2016-03-31 10:58       ` Michal Hocko
  2016-03-31 17:03       ` Andrew Morton
  1 sibling, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-03-31 10:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Andrew Morton, LKML, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, David S. Miller, Tony Luck,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch

On Thu 31-03-16 11:20:05, Ingo Molnar wrote:
> 
> * Michal Hocko <mhocko@kernel.org> wrote:
> 
> > On Wed 30-03-16 15:32:17, Peter Zijlstra wrote:
> > > On Mon, Feb 29, 2016 at 01:58:14PM +0100, Michal Hocko wrote:
> > > > I have tested on x86 with OOM situations with high mmap_sem contention
> > > > (basically many parallel page faults racing with many parallel mmap/munmap
> > > > tight loops) so the waiters for the write locks are routinely interrupted
> > > > by SIGKILL.
> > > 
> > > Aside from the one niggle (as per the other email) they look good to me
> > > and I would take them through the tip/locking tree.
> > 
> > Thanks for the review! I understand that tip/locking would be the most 
> > appropriate place [...]
> 
> Yes.
> 
> > [...] but I am wondering whether this causes some issues with the follow up 
> > patches which use this new API and which I expect to go via Andrew's tree.
> 
> So AFAIK Andrew's tree is based on top of linux-next, so once it goes into 
> tip:locking/core, -mm can pick it up as well 1-2 days later.

OK. Andrew, just make sure you send the follow up changes to Linus after
tip/locking is merged.

> Please send the changes in isolation, for merge into the locking tree.

Yes, that's what I plan to do in few days.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/11] introduce down_write_killable for rw_semaphore
  2016-03-31  9:20     ` Ingo Molnar
  2016-03-31 10:58       ` Michal Hocko
@ 2016-03-31 17:03       ` Andrew Morton
  2016-04-01  6:33         ` Ingo Molnar
  2016-04-01  7:26         ` Michal Hocko
  1 sibling, 2 replies; 52+ messages in thread
From: Andrew Morton @ 2016-03-31 17:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michal Hocko, Peter Zijlstra, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

On Thu, 31 Mar 2016 11:20:05 +0200 Ingo Molnar <mingo@kernel.org> wrote:

> So AFAIK Andrew's tree is based on top of linux-next

Not really true any more - I only base -mm patches on linux-next
patches when they must be based that way due to some known dependency.

I can certainly handle MM patches which are based on linux-next.  Such
an arrangement is going to make life awkward for Michal's
auto-maintained git tree of the -mm MM patches
(git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git).

Maybe I could merge down_write_killable into -mm's main MM section then
knock it out of my copy of linux-next, so everything is seamless for
mm.git.  I've done that before.

But it's all a bit of a pain - it would be simpler to keep
down_write_killable in the same tree as the patches which depend on it.


Michal, how about this?

mm.git currently takes the patches between

#NEXT_PATCHES_START mm
...
#NEXT_PATCHES_END


Can you change it to also take the patches between

#NEXT_PATCHES_START mm-post-next
...
#NEXT_PATCHES_END

?

That way I can do


#NEXT_PATCHES_START mm
...
#NEXT_PATCHES_END
...
linux-next.patch
revert-down_write_killable.patch
...
#NEXT_PATCHES_START mm-post-next
down_write_killlable.patch
...
#NEXT_PATCHES_END

then everything you have should apply and run OK.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/11] introduce down_write_killable for rw_semaphore
  2016-03-31 17:03       ` Andrew Morton
@ 2016-04-01  6:33         ` Ingo Molnar
  2016-04-01  9:21           ` Michal Hocko
  2016-04-01  7:26         ` Michal Hocko
  1 sibling, 1 reply; 52+ messages in thread
From: Ingo Molnar @ 2016-04-01  6:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Peter Zijlstra, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 31 Mar 2016 11:20:05 +0200 Ingo Molnar <mingo@kernel.org> wrote:
> 
> > So AFAIK Andrew's tree is based on top of linux-next
> 
> Not really true any more - I only base -mm patches on linux-next
> patches when they must be based that way due to some known dependency.
> 
> I can certainly handle MM patches which are based on linux-next.  Such
> an arrangement is going to make life awkward for Michal's
> auto-maintained git tree of the -mm MM patches
> (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git).
> 
> Maybe I could merge down_write_killable into -mm's main MM section then
> knock it out of my copy of linux-next, so everything is seamless for
> mm.git.  I've done that before.
> 
> But it's all a bit of a pain - it would be simpler to keep
> down_write_killable in the same tree as the patches which depend on it.

I can help on the Git level: I can do tip:locking/rwsem tree with only these 
changes, with stable sha1's, on which the remaining work can be based. The
locking tree typically goes in early during the merge window, so there's no
real dependencies.

On the source code level this series is changing the existing locking code too,
it doesn't just add a new orthogonal method or so, so I'd really like to have
it in the locking tree.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/11] introduce down_write_killable for rw_semaphore
  2016-03-31 17:03       ` Andrew Morton
  2016-04-01  6:33         ` Ingo Molnar
@ 2016-04-01  7:26         ` Michal Hocko
  2016-04-01  9:11           ` Andrew Morton
  1 sibling, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-04-01  7:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

On Thu 31-03-16 10:03:31, Andrew Morton wrote:
> On Thu, 31 Mar 2016 11:20:05 +0200 Ingo Molnar <mingo@kernel.org> wrote:
> 
> > So AFAIK Andrew's tree is based on top of linux-next
> 
> Not really true any more - I only base -mm patches on linux-next
> patches when they must be based that way due to some known dependency.
> 
> I can certainly handle MM patches which are based on linux-next.  Such
> an arrangement is going to make life awkward for Michal's
> auto-maintained git tree of the -mm MM patches
> (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git).

Don't worry about my mmotm git tree. I will either cherry-pick these
patches from the tip or if Ingo can base them on 4.5 then I can pull
from his branch.

I was more worried about dependencies when you send your patch bomb to
Linus because then it is an additional burden on you to watch for tip
merge before you send yours.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/11] introduce down_write_killable for rw_semaphore
  2016-04-01  7:26         ` Michal Hocko
@ 2016-04-01  9:11           ` Andrew Morton
  0 siblings, 0 replies; 52+ messages in thread
From: Andrew Morton @ 2016-04-01  9:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

On Fri, 1 Apr 2016 09:26:54 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> I was more worried about dependencies when you send your patch bomb to
> Linus because then it is an additional burden on you to watch for tip
> merge before you send yours.

That happens pretty often.  I haven't screwed it up yet ;)

Usually stuff just won't compile, but I always have little notes-to-self
in the series file reminding me to check stuff.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/11] introduce down_write_killable for rw_semaphore
  2016-04-01  6:33         ` Ingo Molnar
@ 2016-04-01  9:21           ` Michal Hocko
  2016-04-01  9:50             ` Ingo Molnar
  0 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-04-01  9:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Peter Zijlstra, LKML, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, David S. Miller, Tony Luck,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch

On Fri 01-04-16 08:33:48, Ingo Molnar wrote:
[...]
> I can help on the Git level: I can do tip:locking/rwsem tree with only these 
> changes, with stable sha1's, on which the remaining work can be based. The
> locking tree typically goes in early during the merge window, so there's no
> real dependencies.

OK, I will wait for this series to apear in tip:locking/rwsem and then
post the follow up patches to Andrew. If you can base this branch on 4.5
that would be ideal for mmotm git tree workflow but I can work that
around should there be a complication.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/11] introduce down_write_killable for rw_semaphore
  2016-04-01  9:21           ` Michal Hocko
@ 2016-04-01  9:50             ` Ingo Molnar
  2016-04-01 10:52               ` Michal Hocko
  0 siblings, 1 reply; 52+ messages in thread
From: Ingo Molnar @ 2016-04-01  9:50 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Peter Zijlstra, LKML, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, David S. Miller, Tony Luck,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch


* Michal Hocko <mhocko@kernel.org> wrote:

> On Fri 01-04-16 08:33:48, Ingo Molnar wrote:
> [...]
> > I can help on the Git level: I can do tip:locking/rwsem tree with only these 
> > changes, with stable sha1's, on which the remaining work can be based. The
> > locking tree typically goes in early during the merge window, so there's no
> > real dependencies.
> 
> OK, I will wait for this series to apear in tip:locking/rwsem and then post the 
> follow up patches to Andrew. [...]

Well, 'this series' was posted a month ago, and patch #3 had a discussion and some 
corrections to it - so please re-send with a v4.6-rc1 base. (v4.5 is fine too if 
you double check that v4.6-rc1 merges cleanly with it.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/11] introduce down_write_killable for rw_semaphore
  2016-04-01  9:50             ` Ingo Molnar
@ 2016-04-01 10:52               ` Michal Hocko
  0 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-04-01 10:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Peter Zijlstra, LKML, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, David S. Miller, Tony Luck,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch

On Fri 01-04-16 11:50:00, Ingo Molnar wrote:
> 
> * Michal Hocko <mhocko@kernel.org> wrote:
> 
> > On Fri 01-04-16 08:33:48, Ingo Molnar wrote:
> > [...]
> > > I can help on the Git level: I can do tip:locking/rwsem tree with only these 
> > > changes, with stable sha1's, on which the remaining work can be based. The
> > > locking tree typically goes in early during the merge window, so there's no
> > > real dependencies.
> > 
> > OK, I will wait for this series to apear in tip:locking/rwsem and then post the 
> > follow up patches to Andrew. [...]
> 
> Well, 'this series' was posted a month ago, and patch #3 had a discussion and some 
> corrections to it - so please re-send with a v4.6-rc1 base. (v4.5 is fine too if 
> you double check that v4.6-rc1 merges cleanly with it.)

Sure will do.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-12 12:19                             ` Michal Hocko
  2016-05-12 13:58                               ` Peter Zijlstra
@ 2016-05-12 19:42                               ` Waiman Long
  1 sibling, 0 replies; 52+ messages in thread
From: Waiman Long @ 2016-05-12 19:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Peter Zijlstra, Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, Davidlohr Bueso

On 05/12/2016 08:19 AM, Michal Hocko wrote:
> On Thu 12-05-16 14:12:04, Peter Zijlstra wrote:
>> On Wed, May 11, 2016 at 08:03:46PM +0200, Michal Hocko wrote:
>>> I still cannot say I would understand why the pending
>>> RWSEM_WAITING_BIAS matters but I would probably need to look at the code
>>> again with a clean head, __rwsem_wake is quite tricky...
>> Ah, you're asking why an unconditional __rwsem_wake(ANY) isn't enough?
>>
>> Because; if at that point there's nobody waiting, we're left with an
>> empty list and WAITER_BIAS set. This in turn will make all fast paths
>> fail.
>>
>> Look at rwsem_down_read_failed() for instance; if we enter that we'll
>> unconditionally queue ourself, with nobody left to come wake us.
> This is still not clear to me because rwsem_down_read_failed will call
> __rwsem_do_wake if the count is RWSEM_WAITING_BIAS so we shouldn't go to
> sleep and get the lock. So you are right that we would force everybody
> to the slow path which is not great but shouldn't cause incorrect
> behavior. I guess I must be missing something obvious here...

Because of writer lock stealing, having a count of RWSEM_WAITING_BIAS 
doesn't mean the reader can surely get the lock even if it is the first 
one in the queue. Calling __rwsem_do_wake() will take care of all the 
locking and queue checking work. Yes, I think it is a bit odd for the 
possibility that a task may wake up itself. Maybe we can add code like:

--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -202,7 +202,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
rwsem_wake_type wake_type)
           */
          smp_mb();
          waiter->task = NULL;
-        wake_up_process(tsk);
+        if (tsk != current)
+            wake_up_process(tsk);
          put_task_struct(tsk);
      } while (--loop);

Cheers,
Longman

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-12 12:19                             ` Michal Hocko
@ 2016-05-12 13:58                               ` Peter Zijlstra
  2016-05-12 19:42                               ` Waiman Long
  1 sibling, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2016-05-12 13:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Thu, May 12, 2016 at 02:19:07PM +0200, Michal Hocko wrote:
> On Thu 12-05-16 14:12:04, Peter Zijlstra wrote:
> > On Wed, May 11, 2016 at 08:03:46PM +0200, Michal Hocko wrote:
> > > I still cannot say I would understand why the pending
> > > RWSEM_WAITING_BIAS matters but I would probably need to look at the code
> > > again with a clean head, __rwsem_wake is quite tricky...
> > 
> > Ah, you're asking why an unconditional __rwsem_wake(ANY) isn't enough?
> > 
> > Because; if at that point there's nobody waiting, we're left with an
> > empty list and WAITER_BIAS set. This in turn will make all fast paths
> > fail.
> > 
> > Look at rwsem_down_read_failed() for instance; if we enter that we'll
> > unconditionally queue ourself, with nobody left to come wake us.
> 
> This is still not clear to me because rwsem_down_read_failed will call
> __rwsem_do_wake if the count is RWSEM_WAITING_BIAS so we shouldn't go to
> sleep and get the lock. So you are right that we would force everybody
> to the slow path which is not great but shouldn't cause incorrect
> behavior. I guess I must be missing something obvious here...


Ah me too; I missed the obvious: we do the __rwsem_do_wake() after we
add ourselves to the list, which means we'll also wake ourselves.

I'll have more thinking..

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-12 12:12                           ` Peter Zijlstra
@ 2016-05-12 12:19                             ` Michal Hocko
  2016-05-12 13:58                               ` Peter Zijlstra
  2016-05-12 19:42                               ` Waiman Long
  0 siblings, 2 replies; 52+ messages in thread
From: Michal Hocko @ 2016-05-12 12:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Thu 12-05-16 14:12:04, Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 08:03:46PM +0200, Michal Hocko wrote:
> > I still cannot say I would understand why the pending
> > RWSEM_WAITING_BIAS matters but I would probably need to look at the code
> > again with a clean head, __rwsem_wake is quite tricky...
> 
> Ah, you're asking why an unconditional __rwsem_wake(ANY) isn't enough?
> 
> Because; if at that point there's nobody waiting, we're left with an
> empty list and WAITER_BIAS set. This in turn will make all fast paths
> fail.
> 
> Look at rwsem_down_read_failed() for instance; if we enter that we'll
> unconditionally queue ourself, with nobody left to come wake us.

This is still not clear to me because rwsem_down_read_failed will call
__rwsem_do_wake if the count is RWSEM_WAITING_BIAS so we shouldn't go to
sleep and get the lock. So you are right that we would force everybody
to the slow path which is not great but shouldn't cause incorrect
behavior. I guess I must be missing something obvious here...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11 18:03                         ` Michal Hocko
@ 2016-05-12 12:12                           ` Peter Zijlstra
  2016-05-12 12:19                             ` Michal Hocko
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2016-05-12 12:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed, May 11, 2016 at 08:03:46PM +0200, Michal Hocko wrote:
> I still cannot say I would understand why the pending
> RWSEM_WAITING_BIAS matters but I would probably need to look at the code
> again with a clean head, __rwsem_wake is quite tricky...

Ah, you're asking why an unconditional __rwsem_wake(ANY) isn't enough?

Because; if at that point there's nobody waiting, we're left with an
empty list and WAITER_BIAS set. This in turn will make all fast paths
fail.

Look at rwsem_down_read_failed() for instance; if we enter that we'll
unconditionally queue ourself, with nobody left to come wake us.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11 13:59                       ` Michal Hocko
@ 2016-05-11 18:03                         ` Michal Hocko
  2016-05-12 12:12                           ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-05-11 18:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed 11-05-16 15:59:38, Michal Hocko wrote:
> On Wed 11-05-16 11:41:28, Peter Zijlstra wrote:
> > On Wed, May 11, 2016 at 11:31:27AM +0200, Michal Hocko wrote:
> > 
> > > Care to cook up a full patch?
> > 
> > compile tested only, if someone could please test it?
> 
> I have tried to run the test case from Tetsuo[1] with a small printk to
> show the interrupted writer case:
> [ 2753.596678] XXX: Writer interrupted. Woken waiters:0
> [ 2998.266978] XXX: Writer interrupted. Woken waiters:0
> 
> which means rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem) path which is
> the problematic case. oom_reaper was always able to succeed so I guess
> the patch works as expected. I will leave the test run for longer to be
> sure.

And just for the reference I am able to reproduce the lockup without the
patch applied and the same test case and a debugging patch

[ 1522.036379] XXX interrupted. list_is_singular:1
[ 1523.040462] oom_reaper: unable to reap pid:3736 (tgid=3736)

I still cannot say I would understand why the pending
RWSEM_WAITING_BIAS matters but I would probably need to look at the code
again with a clean head, __rwsem_wake is quite tricky...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  9:41                     ` Peter Zijlstra
@ 2016-05-11 13:59                       ` Michal Hocko
  2016-05-11 18:03                         ` Michal Hocko
  0 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-05-11 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed 11-05-16 11:41:28, Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 11:31:27AM +0200, Michal Hocko wrote:
> 
> > Care to cook up a full patch?
> 
> compile tested only, if someone could please test it?

I have tried to run the test case from Tetsuo[1] with a small printk to
show the interrupted writer case:
[ 2753.596678] XXX: Writer interrupted. Woken waiters:0
[ 2998.266978] XXX: Writer interrupted. Woken waiters:0

which means rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem) path which is
the problematic case. oom_reaper was always able to succeed so I guess
the patch works as expected. I will leave the test run for longer to be
sure.

[1] http://lkml.kernel.org/r/201605061958.HHG48967.JVFtSLFQOFOOMH@I-love.SAKURA.ne.jp
> 
> I'll go write a Changelog if it turns out to actually work ;-)
> 
> ---
>  kernel/locking/rwsem-xadd.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index df4dcb883b50..09e30c6225e5 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -487,23 +487,32 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
>  
>  		/* Block until there are no active lockers. */
>  		do {
> -			if (signal_pending_state(state, current)) {
> -				raw_spin_lock_irq(&sem->wait_lock);
> -				ret = ERR_PTR(-EINTR);
> -				goto out;
> -			}
> +			if (signal_pending_state(state, current))
> +				goto out_nolock;
> +
>  			schedule();
>  			set_current_state(state);
>  		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
>  
>  		raw_spin_lock_irq(&sem->wait_lock);
>  	}
> -out:
>  	__set_current_state(TASK_RUNNING);
>  	list_del(&waiter.list);
>  	raw_spin_unlock_irq(&sem->wait_lock);
>  
>  	return ret;
> +
> +out_nolock:
> +	__set_current_state(TASK_RUNNING);
> +	raw_spin_lock_irq(&sem->wait_lock);
> +	list_del(&waiter.list);
> +	if (list_empty(&sem->wait_list))
> +		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
> +	else
> +		__rwsem_do_wake(sem, RWSEM_WAKE_ANY);
> +	raw_spin_unlock_irq(&sem->wait_lock);
> +
> +	return ERR_PTR(-EINTR);
>  }
>  
>  __visible struct rw_semaphore * __sched

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  9:31                   ` Michal Hocko
@ 2016-05-11  9:41                     ` Peter Zijlstra
  2016-05-11 13:59                       ` Michal Hocko
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2016-05-11  9:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed, May 11, 2016 at 11:31:27AM +0200, Michal Hocko wrote:

> Care to cook up a full patch?

compile tested only, if someone could please test it?

I'll go write a Changelog if it turns out to actually work ;-)

---
 kernel/locking/rwsem-xadd.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index df4dcb883b50..09e30c6225e5 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -487,23 +487,32 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 
 		/* Block until there are no active lockers. */
 		do {
-			if (signal_pending_state(state, current)) {
-				raw_spin_lock_irq(&sem->wait_lock);
-				ret = ERR_PTR(-EINTR);
-				goto out;
-			}
+			if (signal_pending_state(state, current))
+				goto out_nolock;
+
 			schedule();
 			set_current_state(state);
 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
-out:
 	__set_current_state(TASK_RUNNING);
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
 
 	return ret;
+
+out_nolock:
+	__set_current_state(TASK_RUNNING);
+	raw_spin_lock_irq(&sem->wait_lock);
+	list_del(&waiter.list);
+	if (list_empty(&sem->wait_list))
+		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
+	else
+		__rwsem_do_wake(sem, RWSEM_WAKE_ANY);
+	raw_spin_unlock_irq(&sem->wait_lock);
+
+	return ERR_PTR(-EINTR);
 }
 
 __visible struct rw_semaphore * __sched

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  9:17                 ` Peter Zijlstra
@ 2016-05-11  9:31                   ` Michal Hocko
  2016-05-11  9:41                     ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-05-11  9:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed 11-05-16 11:17:33, Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 11:04:42AM +0200, Michal Hocko wrote:
> > On Wed 11-05-16 10:44:01, Peter Zijlstra wrote:
> > [...]
> > > @@ -504,6 +502,18 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
> > >  	raw_spin_unlock_irq(&sem->wait_lock);
> > >  
> > >  	return ret;
> > > +
> > > +out_nolock:
> > > +	__set_current_state(TASK_RUNNING);
> > > +	raw_spin_lock_irq(&sem->wait_lock);
> > > +	list_del(&waiter.list);
> > > +	if (list_empty(&sem->wait_list))
> > > +		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
> > > +	else
> > > +		__rwsem_do_wake(sem, RWSEM_WAKE_READERS);
> > > +	raw_spin_unlock_irq(&sem->wait_lock);
> > > +
> > > +	return ERR_PTR(-EINTR);
> > >  }
> > 
> > Looks much better but don't we have to check the count for potentially
> > pending writers?
> 
> Ah, so I was thinking that if we get here, there must still be an owner,
> otherwise we'd have acquired the lock. And if there is an owner, we
> cannot go wake writers. Hence the WAKE_READERS thing.

I was worried about the case where the owner is writer and we would wake
readers but I have missed that this wouldn't happen because of

	if (wake_type != RWSEM_WAKE_READ_OWNED) {
		adjustment = RWSEM_ACTIVE_READ_BIAS;
 try_reader_grant:
		oldcount = rwsem_atomic_update(adjustment, sem) - adjustment;
		if (unlikely(oldcount < RWSEM_WAITING_BIAS)) {
			/* A writer stole the lock. Undo our reader grant. */
			if (rwsem_atomic_update(-adjustment, sem) &
						RWSEM_ACTIVE_MASK)
				goto out;
			/* Last active locker left. Retry waking readers. */
			goto try_reader_grant;
		}
	}

> Then again, WAKE_ANY would not harm, in that if we do wake a pending
> writer it will not proceed if it cannot and it'll go back to sleep
> again.

true

> So yeah, maybe WAKE_ANY is the prudent thing to do.

I guess so.

Care to cook up a full patch?

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  9:04               ` Michal Hocko
@ 2016-05-11  9:17                 ` Peter Zijlstra
  2016-05-11  9:31                   ` Michal Hocko
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2016-05-11  9:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed, May 11, 2016 at 11:04:42AM +0200, Michal Hocko wrote:
> On Wed 11-05-16 10:44:01, Peter Zijlstra wrote:
> [...]
> > @@ -504,6 +502,18 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
> >  	raw_spin_unlock_irq(&sem->wait_lock);
> >  
> >  	return ret;
> > +
> > +out_nolock:
> > +	__set_current_state(TASK_RUNNING);
> > +	raw_spin_lock_irq(&sem->wait_lock);
> > +	list_del(&waiter.list);
> > +	if (list_empty(&sem->wait_list))
> > +		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
> > +	else
> > +		__rwsem_do_wake(sem, RWSEM_WAKE_READERS);
> > +	raw_spin_unlock_irq(&sem->wait_lock);
> > +
> > +	return ERR_PTR(-EINTR);
> >  }
> 
> Looks much better but don't we have to check the count for potentially
> pending writers?

Ah, so I was thinking that if we get here, there must still be an owner,
otherwise we'd have acquired the lock. And if there is an owner, we
cannot go wake writers. Hence the WAKE_READERS thing.

Then again, WAKE_ANY would not harm, in that if we do wake a pending
writer it will not proceed if it cannot and it'll go back to sleep
again.

So yeah, maybe WAKE_ANY is the prudent thing to do.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  8:44             ` Peter Zijlstra
@ 2016-05-11  9:04               ` Michal Hocko
  2016-05-11  9:17                 ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-05-11  9:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed 11-05-16 10:44:01, Peter Zijlstra wrote:
[...]
> @@ -504,6 +502,18 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
>  	raw_spin_unlock_irq(&sem->wait_lock);
>  
>  	return ret;
> +
> +out_nolock:
> +	__set_current_state(TASK_RUNNING);
> +	raw_spin_lock_irq(&sem->wait_lock);
> +	list_del(&waiter.list);
> +	if (list_empty(&sem->wait_list))
> +		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
> +	else
> +		__rwsem_do_wake(sem, RWSEM_WAKE_READERS);
> +	raw_spin_unlock_irq(&sem->wait_lock);
> +
> +	return ERR_PTR(-EINTR);
>  }

Looks much better but don't we have to check the count for potentially
pending writers?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  8:35           ` Peter Zijlstra
@ 2016-05-11  9:02             ` Michal Hocko
  0 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-05-11  9:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed 11-05-16 10:35:12, Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 09:23:57AM +0200, Michal Hocko wrote:
> > On Tue 10-05-16 14:38:06, Peter Zijlstra wrote:
> 
> > > Also, looking at it again; I think we're forgetting to re-adjust the
> > > BIAS for the cancelled writer.
> > 
> > Hmm, __rwsem_down_write_failed_common does
> > 
> > 	/* undo write bias from down_write operation, stop active locking */
> > 	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
> > 
> > which should remove the bias AFAIU.
> 
> Right; at this point we're neutral wrt bias.
> 
> > Later we do
> > 
> > 	if (waiting) {
> > 		count = READ_ONCE(sem->count);
> > 
> > 		/*
> > 		 * If there were already threads queued before us and there are
> > 		 * no active writers, the lock must be read owned; so we try to
> > 		 * wake any read locks that were queued ahead of us.
> > 		 */
> > 		if (count > RWSEM_WAITING_BIAS)
> > 			sem = __rwsem_do_wake(sem, RWSEM_WAKE_READERS);
> > 
> > 	} else
> > 		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
> > 
> > and that might set RWSEM_WAITING_BIAS but the current holder of the lock
> > should handle that correctly and wake the waiting tasks IIUC. I will go
> > and check the code closer. It is quite easy to get this subtle code
> > wrong..
> 
> Subtle; yes.
> 
> So if you look at rwsem_try_write_lock() -- traditionally the only way
> to exit this wait loop, you see it does:
> 
> 	if (count == RWSEM_WAITING_BIAS &&
> 	    cmpxchg_acquire(&sem->count, RWSEM_WAITING_BIAS,
> 		    RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
> 		if (!list_is_singular(&sem->wait_list))
> 			rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
> 		rwsem_set_owner(sem);
> 		return true;
> 	}
> 
> Which ends up clearing RWSEM_WAITING_BIAS is we were the only waiter --
> or rather, it always clear WAITING, but then tests the list and re-sets
> it if there's more than one waiters on.
> 
> Now, the signal break doesn't clear WAITING if we were the only waiter
> on the list; which means any further down_read() will block (I didn't
> look at what a subsequent down_write() would do).

I was staring at this part as well but then I convinced myself that this
is OK because rwsem_down_read_failed does:

	if (list_empty(&sem->wait_list))
		adjustment += RWSEM_WAITING_BIAS;
	list_add_tail(&waiter.list, &sem->wait_list);

	/* we're now waiting on the lock, but no longer actively locking */
	count = rwsem_atomic_update(adjustment, sem);

	/* If there are no active locks, wake the front queued process(es).
	 *
	 * If there are no writers and we are first in the queue,
	 * wake our own waiter to join the existing active readers !
	 */
	if (count == RWSEM_WAITING_BIAS ||
	    (count > RWSEM_WAITING_BIAS &&
	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
		sem = __rwsem_do_wake(sem, RWSEM_WAKE_ANY);

__rwsem_do_wake should then see all the readers (including the current
one) and wake them up and set waiter->task to NULL to allow the current
one to break out of the loop as well.

> So I think we needs something like this, to clear WAITING if we leave
> the list empty.

But maybe this is the correct way to go.

> Does that make sense?

I do not see an immediate problem with this. Maybe Tetsuo can try this
out and see if it really makes a difference.

> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index df4dcb883b50..7011dd1c286c 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -489,6 +489,8 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
>  		do {
>  			if (signal_pending_state(state, current)) {
>  				raw_spin_lock_irq(&sem->wait_lock);
> +				if (list_singular(&sem->wait_list))
> +					rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
>  				ret = ERR_PTR(-EINTR);
>  				goto out;
>  			}

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  8:28           ` Michal Hocko
@ 2016-05-11  8:44             ` Peter Zijlstra
  2016-05-11  9:04               ` Michal Hocko
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2016-05-11  8:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed, May 11, 2016 at 10:28:53AM +0200, Michal Hocko wrote:

> Does the following look correct/reasonable? This is absolutely untested
> and more for a discussion:

I would much rather see it in common; something like so perhaps.

---
 kernel/locking/rwsem-xadd.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index df4dcb883b50..5d7f2831a475 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -487,11 +487,9 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 
 		/* Block until there are no active lockers. */
 		do {
-			if (signal_pending_state(state, current)) {
-				raw_spin_lock_irq(&sem->wait_lock);
-				ret = ERR_PTR(-EINTR);
-				goto out;
-			}
+			if (signal_pending_state(state, current))
+				goto out_nolock;
+
 			schedule();
 			set_current_state(state);
 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
@@ -504,6 +502,18 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 	raw_spin_unlock_irq(&sem->wait_lock);
 
 	return ret;
+
+out_nolock:
+	__set_current_state(TASK_RUNNING);
+	raw_spin_lock_irq(&sem->wait_lock);
+	list_del(&waiter.list);
+	if (list_empty(&sem->wait_list))
+		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
+	else
+		__rwsem_do_wake(sem, RWSEM_WAKE_READERS);
+	raw_spin_unlock_irq(&sem->wait_lock);
+
+	return ERR_PTR(-EINTR);
 }
 
 __visible struct rw_semaphore * __sched

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  7:23         ` Michal Hocko
  2016-05-11  8:28           ` Michal Hocko
@ 2016-05-11  8:35           ` Peter Zijlstra
  2016-05-11  9:02             ` Michal Hocko
  1 sibling, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2016-05-11  8:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed, May 11, 2016 at 09:23:57AM +0200, Michal Hocko wrote:
> On Tue 10-05-16 14:38:06, Peter Zijlstra wrote:

> > Also, looking at it again; I think we're forgetting to re-adjust the
> > BIAS for the cancelled writer.
> 
> Hmm, __rwsem_down_write_failed_common does
> 
> 	/* undo write bias from down_write operation, stop active locking */
> 	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
> 
> which should remove the bias AFAIU.

Right; at this point we're neutral wrt bias.

> Later we do
> 
> 	if (waiting) {
> 		count = READ_ONCE(sem->count);
> 
> 		/*
> 		 * If there were already threads queued before us and there are
> 		 * no active writers, the lock must be read owned; so we try to
> 		 * wake any read locks that were queued ahead of us.
> 		 */
> 		if (count > RWSEM_WAITING_BIAS)
> 			sem = __rwsem_do_wake(sem, RWSEM_WAKE_READERS);
> 
> 	} else
> 		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
> 
> and that might set RWSEM_WAITING_BIAS but the current holder of the lock
> should handle that correctly and wake the waiting tasks IIUC. I will go
> and check the code closer. It is quite easy to get this subtle code
> wrong..

Subtle; yes.

So if you look at rwsem_try_write_lock() -- traditionally the only way
to exit this wait loop, you see it does:

	if (count == RWSEM_WAITING_BIAS &&
	    cmpxchg_acquire(&sem->count, RWSEM_WAITING_BIAS,
		    RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
		if (!list_is_singular(&sem->wait_list))
			rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
		rwsem_set_owner(sem);
		return true;
	}

Which ends up clearing RWSEM_WAITING_BIAS is we were the only waiter --
or rather, it always clear WAITING, but then tests the list and re-sets
it if there's more than one waiters on.

Now, the signal break doesn't clear WAITING if we were the only waiter
on the list; which means any further down_read() will block (I didn't
look at what a subsequent down_write() would do).

So I think we needs something like this, to clear WAITING if we leave
the list empty.

Does that make sense?

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index df4dcb883b50..7011dd1c286c 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -489,6 +489,8 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 		do {
 			if (signal_pending_state(state, current)) {
 				raw_spin_lock_irq(&sem->wait_lock);
+				if (list_singular(&sem->wait_list))
+					rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
 				ret = ERR_PTR(-EINTR);
 				goto out;
 			}

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  7:23         ` Michal Hocko
@ 2016-05-11  8:28           ` Michal Hocko
  2016-05-11  8:44             ` Peter Zijlstra
  2016-05-11  8:35           ` Peter Zijlstra
  1 sibling, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-05-11  8:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed 11-05-16 09:23:57, Michal Hocko wrote:
> On Tue 10-05-16 14:38:06, Peter Zijlstra wrote:
[...]
> > However, at the time I was thinking that if we have:
> > 
> > 	reader (owner)
> > 	writer (pending)
> > 	reader (blocked on writer)
> > 
> > and writer would get cancelled, the up_read() would do a wakeup and kick
> > the blocked reader.
> > 
> > But yes, immediately kicking further pending waiters might be better.
> 
> OK, that makes sense. We shouldn't be waiting for the first reader to
> do up_read.

I am still trying to wrap my head around the wake up logic (I managed to
forget all the juicy details). I still do not see the correctness
issue with the current code so the following should be "just an
optimization". But as I've said I might be missing something here.

Does the following look correct/reasonable? This is absolutely untested
and more for a discussion:
---
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index df4dcb883b50..726620c97366 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -516,7 +516,27 @@ EXPORT_SYMBOL(rwsem_down_write_failed);
 __visible struct rw_semaphore * __sched
 rwsem_down_write_failed_killable(struct rw_semaphore *sem)
 {
-	return __rwsem_down_write_failed_common(sem, TASK_KILLABLE);
+	struct rw_semaphore * ret;
+
+	ret = __rwsem_down_write_failed_common(sem, TASK_KILLABLE);
+	if (IS_ERR(ret)) {
+		long count;
+
+		raw_spin_lock_irq(&sem->wait_lock);
+		count = READ_ONCE(sem->count);
+
+		/*
+		 * If there were already threads queued before us and there are
+		 * no active writers, the lock must be read owned; so we try to
+		 * wake any read locks that were queued ahead of us so they
+		 * do not have to wait for up_read to wake up.
+		 */
+		if (count > RWSEM_WAITING_BIAS)
+			sem = __rwsem_do_wake(sem, RWSEM_WAKE_READERS);
+		raw_spin_unlock_irq(&sem->wait_lock);
+	}
+
+	return ret;
 }
 EXPORT_SYMBOL(rwsem_down_write_failed_killable);
 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-10 12:38       ` Peter Zijlstra
  2016-05-10 13:57         ` Tetsuo Handa
@ 2016-05-11  7:23         ` Michal Hocko
  2016-05-11  8:28           ` Michal Hocko
  2016-05-11  8:35           ` Peter Zijlstra
  1 sibling, 2 replies; 52+ messages in thread
From: Michal Hocko @ 2016-05-11  7:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Tue 10-05-16 14:38:06, Peter Zijlstra wrote:
> On Tue, May 10, 2016 at 01:53:20PM +0200, Michal Hocko wrote:
> > On Tue 10-05-16 19:43:20, Tetsuo Handa wrote:
> > > I hit "allowing the OOM killer to select the same thread again" problem
> > > ( http://lkml.kernel.org/r/20160408113425.GF29820@dhcp22.suse.cz ), but
> > > I think that there is a bug in down_write_killable() series (at least
> > > "locking, rwsem: introduce basis for down_write_killable" patch).
> > > 
> > > Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160510-sem.txt.xz .
> > [...]
> > > 2 threads (PID: 1314 and 1443) are sleeping at rwsem_down_read_failed()
> > > but no thread is sleeping at rwsem_down_write_failed_killable().
> > > If there is no thread waiting for write lock, threads waiting for read
> > > lock must be able to run. This suggests that one of threads which was
> > > waiting for write lock forgot to wake up reader threads.
> > 
> > Or that the write lock holder is still keeping the lock held. I do not
> > see such a process in your list though. Is it possible that the
> > debug_show_all_locks would just miss it as it is not sleeping?
> > 
> > > Looking at rwsem_down_read_failed(), reader threads waiting for the
> > > writer thread to release the lock are waiting on sem->wait_list list.
> > > Looking at __rwsem_down_write_failed_common(), when the writer thread
> > > escaped the
> > > 
> > >                  /* Block until there are no active lockers. */
> > >                  do {
> > >                          if (signal_pending_state(state, current)) {
> > >                                  raw_spin_lock_irq(&sem->wait_lock);
> > >                                  ret = ERR_PTR(-EINTR);
> > >                                  goto out;
> > >                          }
> > >                          schedule();
> > >                          set_current_state(state);
> > >                  } while ((count = sem->count) & RWSEM_ACTIVE_MASK);
> > > 
> > > loop due to SIGKILL, I think that the writer thread needs to check for
> > > remaining threads on sem->wait_list list and wake up reader threads
> > > before rwsem_down_write_failed_killable() returns -EINTR.
> > 
> > I am not sure I understand. The rwsem counter is not write locked while
> > the thread is sleeping and when we fail on the signal pending so readers
> > should be able to proceed, no?
> > 
> > Or are you suggesting that the failure path should call rwsem_wake? I
> > do not see __mutex_lock_common for killable wait doing something like
> > that and rwsem_wake is explicitly documented that it is called after the
> > lock state has been updated already. Now I might be missing something
> > subtle here but I guess the code is correct and it is more likely that
> > the holder of the lock wasn't killed but it is rather holding the lock
> > and doing something else.
> 
> Mutex is much simpler; it doesn't have to do the reader-vs-writer
> fairness thing.
> 
> However, at the time I was thinking that if we have:
> 
> 	reader (owner)
> 	writer (pending)
> 	reader (blocked on writer)
> 
> and writer would get cancelled, the up_read() would do a wakeup and kick
> the blocked reader.
> 
> But yes, immediately kicking further pending waiters might be better.

OK, that makes sense. We shouldn't be waiting for the first reader to
do up_read.

> Also, looking at it again; I think we're forgetting to re-adjust the
> BIAS for the cancelled writer.

Hmm, __rwsem_down_write_failed_common does

	/* undo write bias from down_write operation, stop active locking */
	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);

which should remove the bias AFAIU. Later we do

	if (waiting) {
		count = READ_ONCE(sem->count);

		/*
		 * If there were already threads queued before us and there are
		 * no active writers, the lock must be read owned; so we try to
		 * wake any read locks that were queued ahead of us.
		 */
		if (count > RWSEM_WAITING_BIAS)
			sem = __rwsem_do_wake(sem, RWSEM_WAKE_READERS);

	} else
		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);

and that might set RWSEM_WAITING_BIAS but the current holder of the lock
should handle that correctly and wake the waiting tasks IIUC. I will go
and check the code closer. It is quite easy to get this subtle code
wrong...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-10 12:38       ` Peter Zijlstra
@ 2016-05-10 13:57         ` Tetsuo Handa
  2016-05-11  7:23         ` Michal Hocko
  1 sibling, 0 replies; 52+ messages in thread
From: Tetsuo Handa @ 2016-05-10 13:57 UTC (permalink / raw)
  To: peterz, mhocko
  Cc: linux-kernel, mingo, tglx, hpa, davem, tony.luck, akpm, chris,
	jcmvbkbc, dave, Waiman.Long

Michal Hocko wrote:
> On Tue 10-05-16 19:43:20, Tetsuo Handa wrote:
> > I hit "allowing the OOM killer to select the same thread again" problem
> > ( http://lkml.kernel.org/r/20160408113425.GF29820@dhcp22.suse.cz ), but
> > I think that there is a bug in down_write_killable() series (at least
> > "locking, rwsem: introduce basis for down_write_killable" patch).
> > 
> > Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160510-sem.txt.xz .
> [...]
> > 2 threads (PID: 1314 and 1443) are sleeping at rwsem_down_read_failed()
> > but no thread is sleeping at rwsem_down_write_failed_killable().
> > If there is no thread waiting for write lock, threads waiting for read
> > lock must be able to run. This suggests that one of threads which was
> > waiting for write lock forgot to wake up reader threads.
> 
> Or that the write lock holder is still keeping the lock held. I do not
> see such a process in your list though. Is it possible that the
> debug_show_all_locks would just miss it as it is not sleeping?

I don't think it is possible. This reproducer
( http://lkml.kernel.org/r/201605061958.HHG48967.JVFtSLFQOFOOMH@I-love.SAKURA.ne.jp )
creates a thread group with two threads, and two of these two threads are
sleeping at rwsem_down_read_failed() waiting for mmap_sem.
SysRq-t suggests that PID 1443 called rwsem_down_write_failed_killable()
before calling rwsem_down_read_failed().

By the way, I suggested you to show traces of threads which are using the OOM victim's mm
( http://lkml.kernel.org/r/201603172200.CIE52148.QOVSOHJFMLOFtF@I-love.SAKURA.ne.jp ),
but you said that showing all locks held by !TASK_RUNNING threads would be useful
( http://lkml.kernel.org/r/20160323120716.GE7059@dhcp22.suse.cz ).
Do you admit that debug_show_all_locks() is not always useful by suspecting
the possibility of debug_show_all_locks() failing to report a thread which
held mmap_sem for write? (This is a kmallocwd topic, so I stop here.)

> 
> > Looking at rwsem_down_read_failed(), reader threads waiting for the
> > writer thread to release the lock are waiting on sem->wait_list list.
> > Looking at __rwsem_down_write_failed_common(), when the writer thread
> > escaped the
> > 
> >                  /* Block until there are no active lockers. */
> >                  do {
> >                          if (signal_pending_state(state, current)) {
> >                                  raw_spin_lock_irq(&sem->wait_lock);
> >                                  ret = ERR_PTR(-EINTR);
> >                                  goto out;
> >                          }
> >                          schedule();
> >                          set_current_state(state);
> >                  } while ((count = sem->count) & RWSEM_ACTIVE_MASK);
> > 
> > loop due to SIGKILL, I think that the writer thread needs to check for
> > remaining threads on sem->wait_list list and wake up reader threads
> > before rwsem_down_write_failed_killable() returns -EINTR.
> 
> I am not sure I understand. The rwsem counter is not write locked while
> the thread is sleeping and when we fail on the signal pending so readers
> should be able to proceed, no?
> 
I guess __rwsem_do_wake() is needed for waking up the readers because
I guess the sequence occurred was

  (1) PID 1314 requested down_read() and succeeded.
  (2) PID 1443 requested down_write_killable() and blocked.
  (3) The OOM killer sent SIGKILL to PID 1314 and PID 1443.
  (4) PID 1443 left down_write_killable() with -EINTR.
  (5) PID 1314 called up_read() and down_read()
      while PID 1443 called down_read().

.

> Or are you suggesting that the failure path should call rwsem_wake?

I don't know how rwsem works. Please consult maintainers.

Peter Zijlstra wrote:
> Mutex is much simpler; it doesn't have to do the reader-vs-writer
> fairness thing.
> 
> However, at the time I was thinking that if we have:
> 
> 	reader (owner)
> 	writer (pending)
> 	reader (blocked on writer)
> 
> and writer would get cancelled, the up_read() would do a wakeup and kick
> the blocked reader.
> 
> But yes, immediately kicking further pending waiters might be better.
> 
> Also, looking at it again; I think we're forgetting to re-adjust the
> BIAS for the cancelled writer.

Yes, I think so.
> 
> Davidlohr, Waiman, can you look at this?
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-10 11:53     ` Michal Hocko
@ 2016-05-10 12:38       ` Peter Zijlstra
  2016-05-10 13:57         ` Tetsuo Handa
  2016-05-11  7:23         ` Michal Hocko
  0 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2016-05-10 12:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Tue, May 10, 2016 at 01:53:20PM +0200, Michal Hocko wrote:
> On Tue 10-05-16 19:43:20, Tetsuo Handa wrote:
> > I hit "allowing the OOM killer to select the same thread again" problem
> > ( http://lkml.kernel.org/r/20160408113425.GF29820@dhcp22.suse.cz ), but
> > I think that there is a bug in down_write_killable() series (at least
> > "locking, rwsem: introduce basis for down_write_killable" patch).
> > 
> > Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160510-sem.txt.xz .
> [...]
> > 2 threads (PID: 1314 and 1443) are sleeping at rwsem_down_read_failed()
> > but no thread is sleeping at rwsem_down_write_failed_killable().
> > If there is no thread waiting for write lock, threads waiting for read
> > lock must be able to run. This suggests that one of threads which was
> > waiting for write lock forgot to wake up reader threads.
> 
> Or that the write lock holder is still keeping the lock held. I do not
> see such a process in your list though. Is it possible that the
> debug_show_all_locks would just miss it as it is not sleeping?
> 
> > Looking at rwsem_down_read_failed(), reader threads waiting for the
> > writer thread to release the lock are waiting on sem->wait_list list.
> > Looking at __rwsem_down_write_failed_common(), when the writer thread
> > escaped the
> > 
> >                  /* Block until there are no active lockers. */
> >                  do {
> >                          if (signal_pending_state(state, current)) {
> >                                  raw_spin_lock_irq(&sem->wait_lock);
> >                                  ret = ERR_PTR(-EINTR);
> >                                  goto out;
> >                          }
> >                          schedule();
> >                          set_current_state(state);
> >                  } while ((count = sem->count) & RWSEM_ACTIVE_MASK);
> > 
> > loop due to SIGKILL, I think that the writer thread needs to check for
> > remaining threads on sem->wait_list list and wake up reader threads
> > before rwsem_down_write_failed_killable() returns -EINTR.
> 
> I am not sure I understand. The rwsem counter is not write locked while
> the thread is sleeping and when we fail on the signal pending so readers
> should be able to proceed, no?
> 
> Or are you suggesting that the failure path should call rwsem_wake? I
> do not see __mutex_lock_common for killable wait doing something like
> that and rwsem_wake is explicitly documented that it is called after the
> lock state has been updated already. Now I might be missing something
> subtle here but I guess the code is correct and it is more likely that
> the holder of the lock wasn't killed but it is rather holding the lock
> and doing something else.

Mutex is much simpler; it doesn't have to do the reader-vs-writer
fairness thing.

However, at the time I was thinking that if we have:

	reader (owner)
	writer (pending)
	reader (blocked on writer)

and writer would get cancelled, the up_read() would do a wakeup and kick
the blocked reader.

But yes, immediately kicking further pending waiters might be better.

Also, looking at it again; I think we're forgetting to re-adjust the
BIAS for the cancelled writer.

Davidlohr, Waiman, can you look at this?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-10 10:43   ` Tetsuo Handa
@ 2016-05-10 11:53     ` Michal Hocko
  2016-05-10 12:38       ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Michal Hocko @ 2016-05-10 11:53 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov

On Tue 10-05-16 19:43:20, Tetsuo Handa wrote:
> I hit "allowing the OOM killer to select the same thread again" problem
> ( http://lkml.kernel.org/r/20160408113425.GF29820@dhcp22.suse.cz ), but
> I think that there is a bug in down_write_killable() series (at least
> "locking, rwsem: introduce basis for down_write_killable" patch).
> 
> Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160510-sem.txt.xz .
[...]
> 2 threads (PID: 1314 and 1443) are sleeping at rwsem_down_read_failed()
> but no thread is sleeping at rwsem_down_write_failed_killable().
> If there is no thread waiting for write lock, threads waiting for read
> lock must be able to run. This suggests that one of threads which was
> waiting for write lock forgot to wake up reader threads.

Or that the write lock holder is still keeping the lock held. I do not
see such a process in your list though. Is it possible that the
debug_show_all_locks would just miss it as it is not sleeping?

> Looking at rwsem_down_read_failed(), reader threads waiting for the
> writer thread to release the lock are waiting on sem->wait_list list.
> Looking at __rwsem_down_write_failed_common(), when the writer thread
> escaped the
> 
>                  /* Block until there are no active lockers. */
>                  do {
>                          if (signal_pending_state(state, current)) {
>                                  raw_spin_lock_irq(&sem->wait_lock);
>                                  ret = ERR_PTR(-EINTR);
>                                  goto out;
>                          }
>                          schedule();
>                          set_current_state(state);
>                  } while ((count = sem->count) & RWSEM_ACTIVE_MASK);
> 
> loop due to SIGKILL, I think that the writer thread needs to check for
> remaining threads on sem->wait_list list and wake up reader threads
> before rwsem_down_write_failed_killable() returns -EINTR.

I am not sure I understand. The rwsem counter is not write locked while
the thread is sleeping and when we fail on the signal pending so readers
should be able to proceed, no?

Or are you suggesting that the failure path should call rwsem_wake? I
do not see __mutex_lock_common for killable wait doing something like
that and rwsem_wake is explicitly documented that it is called after the
lock state has been updated already. Now I might be missing something
subtle here but I guess the code is correct and it is more likely that
the holder of the lock wasn't killed but it is rather holding the lock
and doing something else.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-04-01 11:04 ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Michal Hocko
  2016-04-02  4:41   ` Davidlohr Bueso
@ 2016-05-10 10:43   ` Tetsuo Handa
  2016-05-10 11:53     ` Michal Hocko
  1 sibling, 1 reply; 52+ messages in thread
From: Tetsuo Handa @ 2016-05-10 10:43 UTC (permalink / raw)
  To: Michal Hocko, LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Michal Hocko

I hit "allowing the OOM killer to select the same thread again" problem
( http://lkml.kernel.org/r/20160408113425.GF29820@dhcp22.suse.cz ), but
I think that there is a bug in down_write_killable() series (at least
"locking, rwsem: introduce basis for down_write_killable" patch).

Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160510-sem.txt.xz .
----------
[   48.303867] Out of memory: Kill process 1314 (tgid=1314) score 1000 or sacrifice child
[   48.308582] Killed process 1314 (tgid=1314) total-vm:70844kB, anon-rss:1980kB, file-rss:0kB, shmem-rss:0kB
[   49.323719] oom_reaper: unable to reap pid:1314 (tgid=1314)
[   49.338146]
[   49.338146] Showing all locks held in the system:
(...snipped...)
[   49.801071] 1 lock held by tgid=1314/1314:
[   49.803953]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff810fddac>] acct_collect+0x5c/0x1e0
[   49.809200] 1 lock held by tgid=1314/1443:
[   49.812102]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff81073a45>] do_exit+0x175/0xb10
(...snipped...)
[   51.188928] oom_reaper: unable to reap pid:1443 (tgid=1314)
[   55.576750] oom_reaper: unable to reap pid:1314 (tgid=1314)
[   57.717917] oom_reaper: unable to reap pid:1314 (tgid=1314)
[   59.285880] oom_reaper: unable to reap pid:1314 (tgid=1314)
[   60.818697] oom_reaper: unable to reap pid:1314 (tgid=1314)
(...snipped...)
[  174.429572] tgid=1314       D ffff88003ad93b90     0  1314   1209 0x00100084
[  174.429573]  ffff88003ad93b90 ffff88003ad8f6b8 ffff880039c3a140 ffff88003ad8c080
[  174.429574]  ffff88003ad94000 ffff88003ad8f6a0 ffff88003ad8f6b8 0000000000000000
[  174.429575]  0000000000000008 ffff88003ad93ba8 ffffffff81616190 ffff88003ad8c080
[  174.429585] Call Trace:
[  174.429586]  [<ffffffff81616190>] schedule+0x30/0x80
[  174.429587]  [<ffffffff81619e26>] rwsem_down_read_failed+0xd6/0x140
[  174.429589]  [<ffffffff812dd6f8>] call_rwsem_down_read_failed+0x18/0x30
[  174.429590]  [<ffffffff816196dd>] down_read+0x3d/0x50
[  174.429592]  [<ffffffff810fddac>] ? acct_collect+0x5c/0x1e0
[  174.429593]  [<ffffffff810fddac>] acct_collect+0x5c/0x1e0
[  174.429594]  [<ffffffff81073ff5>] do_exit+0x725/0xb10
[  174.429594]  [<ffffffff81074467>] do_group_exit+0x47/0xc0
[  174.429596]  [<ffffffff8108075f>] get_signal+0x20f/0x7b0
[  174.429597]  [<ffffffff81024fb2>] do_signal+0x32/0x700
[  174.429598]  [<ffffffff810bdc69>] ? trace_hardirqs_on+0x9/0x10
[  174.429599]  [<ffffffff810c3552>] ? rwsem_wake+0x72/0xe0
[  174.429600]  [<ffffffff812dd78b>] ? call_rwsem_wake+0x1b/0x30
[  174.429601]  [<ffffffff810b9ee0>] ? up_read+0x30/0x40
[  174.429602]  [<ffffffff8106b495>] ? exit_to_usermode_loop+0x29/0x9e
[  174.429603]  [<ffffffff8106b4bf>] exit_to_usermode_loop+0x53/0x9e
[  174.429604]  [<ffffffff8100348d>] prepare_exit_to_usermode+0x7d/0x90
[  174.429605]  [<ffffffff8161bd3e>] retint_user+0x8/0x23
[  174.429605] tgid=1314       D ffff88003aa2fbd0     0  1443   1209 0x00000084
[  174.429607]  ffff88003aa2fbd0 ffff88003ad8f6b8 ffff8800382060c0 ffff88003aa2a140
[  174.429608]  ffff88003aa30000 ffff88003ad8f6a0 ffff88003ad8f6b8 ffff88003aa2a140
[  174.429609]  0000000000000008 ffff88003aa2fbe8 ffffffff81616190 ffff88003aa2a140
[  174.429610] Call Trace:
[  174.429611]  [<ffffffff81616190>] schedule+0x30/0x80
[  174.429612]  [<ffffffff81619e26>] rwsem_down_read_failed+0xd6/0x140
[  174.429613]  [<ffffffff810bdb99>] ? trace_hardirqs_on_caller+0xf9/0x1c0
[  174.429614]  [<ffffffff812dd6f8>] call_rwsem_down_read_failed+0x18/0x30
[  174.429615]  [<ffffffff816196dd>] down_read+0x3d/0x50
[  174.429616]  [<ffffffff81073a45>] ? do_exit+0x175/0xb10
[  174.429616]  [<ffffffff81073a45>] do_exit+0x175/0xb10
[  174.429617]  [<ffffffff81074467>] do_group_exit+0x47/0xc0
[  174.429618]  [<ffffffff8108075f>] get_signal+0x20f/0x7b0
[  174.429619]  [<ffffffff81024fb2>] do_signal+0x32/0x700
[  174.429620]  [<ffffffff8161acae>] ? _raw_spin_unlock_irq+0x2e/0x40
[  174.429621]  [<ffffffff8161a2bf>] ? rwsem_down_write_failed_killable+0x1ef/0x280
[  174.429631]  [<ffffffff8106b555>] ? syscall_slow_exit_work+0x4b/0x10d
[  174.429632]  [<ffffffff8106b495>] ? exit_to_usermode_loop+0x29/0x9e
[  174.429633]  [<ffffffff8106b4bf>] exit_to_usermode_loop+0x53/0x9e
[  174.429634]  [<ffffffff81003715>] do_syscall_64+0x135/0x1b0
[  174.429635]  [<ffffffff8161b43f>] entry_SYSCALL64_slow_path+0x25/0x25
(...snipped...)
[  217.651477] oom_reaper: unable to reap pid:1314 (tgid=1314)
[  219.071975] oom_reaper: unable to reap pid:1314 (tgid=1314)
[  220.508961] oom_reaper: unable to reap pid:1314 (tgid=1314)
[  222.022111] oom_reaper: unable to reap pid:1314 (tgid=1314)
[  223.560166] oom_reaper: unable to reap pid:1314 (tgid=1314)
[  225.267750] oom_reaper: unable to reap pid:1314 (tgid=1314)
----------

2 threads (PID: 1314 and 1443) are sleeping at rwsem_down_read_failed()
but no thread is sleeping at rwsem_down_write_failed_killable().
If there is no thread waiting for write lock, threads waiting for read
lock must be able to run. This suggests that one of threads which was
waiting for write lock forgot to wake up reader threads.

Looking at rwsem_down_read_failed(), reader threads waiting for the
writer thread to release the lock are waiting on sem->wait_list list.
Looking at __rwsem_down_write_failed_common(), when the writer thread
escaped the

                 /* Block until there are no active lockers. */
                 do {
                         if (signal_pending_state(state, current)) {
                                 raw_spin_lock_irq(&sem->wait_lock);
                                 ret = ERR_PTR(-EINTR);
                                 goto out;
                         }
                         schedule();
                         set_current_state(state);
                 } while ((count = sem->count) & RWSEM_ACTIVE_MASK);

loop due to SIGKILL, I think that the writer thread needs to check for
remaining threads on sem->wait_list list and wake up reader threads
before rwsem_down_write_failed_killable() returns -EINTR.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-04-07  6:58       ` Davidlohr Bueso
@ 2016-04-07  7:38         ` Michal Hocko
  0 siblings, 0 replies; 52+ messages in thread
From: Michal Hocko @ 2016-04-07  7:38 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch

On Wed 06-04-16 23:58:07, Davidlohr Bueso wrote:
> On Mon, 04 Apr 2016, Michal Hocko wrote:
> 
> >Not sure I got your point here.
> 
> You set current to TASK_KILLABLE in the sleep loop, why do you want to change
> it here to TASK_RUNNING if its about to be killed?

Wouldn't it be unexpected to return from a lock with something else than
TASK_RUNNING?

> At least in the case of
> UNINTERRUPTABLE we do it merely as a redundancy after the breaking out of the
> loop. Of course we also acquired the lock in the first place by that time and
> we _better_ be running.

I guess the reason was that rwsem_try_write_lock might suceed and we do
not want to return with TASK_UNINTERRUPTIBLE in that case.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-04-04  9:17     ` Michal Hocko
  2016-04-04  9:21       ` Peter Zijlstra
@ 2016-04-07  6:58       ` Davidlohr Bueso
  2016-04-07  7:38         ` Michal Hocko
  1 sibling, 1 reply; 52+ messages in thread
From: Davidlohr Bueso @ 2016-04-07  6:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch

On Mon, 04 Apr 2016, Michal Hocko wrote:

>Not sure I got your point here.

You set current to TASK_KILLABLE in the sleep loop, why do you want to change
it here to TASK_RUNNING if its about to be killed? At least in the case of
UNINTERRUPTABLE we do it merely as a redundancy after the breaking out of the
loop. Of course we also acquired the lock in the first place by that time and
we _better_ be running.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-04-04  9:17     ` Michal Hocko
@ 2016-04-04  9:21       ` Peter Zijlstra
  2016-04-07  6:58       ` Davidlohr Bueso
  1 sibling, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2016-04-04  9:21 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Davidlohr Bueso, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch

On Mon, Apr 04, 2016 at 11:17:00AM +0200, Michal Hocko wrote:
> > >@@ -486,21 +487,39 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
> > >
> > >		/* Block until there are no active lockers. */
> > >		do {
> > >+			if (signal_pending_state(state, current)) {
> > 
> >                           ^^ unlikely()?
> 
> The generated code is identical after I've added unlikely. I haven't
> tried more gcc versions (mine is 5.3.1) but is this worth it?

Both signal_pending() and __fatal_signal_pending() already have an
unlikely(), which is why adding it here is superfluous.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-04-02  4:41   ` Davidlohr Bueso
@ 2016-04-04  9:17     ` Michal Hocko
  2016-04-04  9:21       ` Peter Zijlstra
  2016-04-07  6:58       ` Davidlohr Bueso
  0 siblings, 2 replies; 52+ messages in thread
From: Michal Hocko @ 2016-04-04  9:17 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch

On Fri 01-04-16 21:41:25, Davidlohr Bueso wrote:
> On Fri, 01 Apr 2016, Michal Hocko wrote:
> 
> >From: Michal Hocko <mhocko@suse.com>
> >
> >Introduce a generic implementation necessary for down_write_killable.
> >This is a trivial extension of the already existing down_write call
> >which can be interrupted by SIGKILL.  This patch doesn't provide
> >down_write_killable yet because arches have to provide the necessary
> >pieces before.
> >
> >rwsem_down_write_failed which is a generic slow path for the
> >write lock is extended to allow a task state and renamed to
> >__rwsem_down_write_failed_state. The return value is either a valid
> >semaphore pointer or ERR_PTR(-EINTR).
> >
> >rwsem_down_write_failed_killable is exported as a new way to wait for
> >the lock and be killable.
> >
> >For rwsem-spinlock implementation the current __down_write it updated
> >in a similar way as __rwsem_down_write_failed_state except it doesn't
> >need new exports just visible __down_write_killable.
> >
> >Architectures which are not using the generic rwsem implementation are
> >supposed to provide their __down_write_killable implementation and
> >use rwsem_down_write_failed_killable for the slow path.
> 
> So in a nutshell, this is supposed to be the (writer) rwsem counterpart of
> mutex_lock_killable() and down_killable(), right?

Yes.

> [...]
> 
> >--- a/kernel/locking/rwsem-xadd.c
> >+++ b/kernel/locking/rwsem-xadd.c
> >@@ -433,12 +433,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
> >/*
> > * Wait until we successfully acquire the write lock
> > */
> >-__visible
> >-struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
> >+static inline struct rw_semaphore *
> >+__rwsem_down_write_failed_state(struct rw_semaphore *sem, int state)
> 
> fwiw I'm not a fan of the _state naming. While I understand why you chose it, I feel
> it does not really describe the purpose of the call itself. The state logic alone is
> really quite small and therefore should not govern the function name. Why not just apply
> kiss and label things _common, ie like mutexes do? This would also standardize names a
> bit.

I really do not care much about naming. So if _common sounds better I
can certainly rename.

> 
> >{
> >	long count;
> >	bool waiting = true; /* any queued threads before us */
> >	struct rwsem_waiter waiter;
> >+	struct rw_semaphore *ret = sem;
> >
> >	/* undo write bias from down_write operation, stop active locking */
> >	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
> >@@ -478,7 +479,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
> >		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
> >
> >	/* wait until we successfully acquire the lock */
> >-	set_current_state(TASK_UNINTERRUPTIBLE);
> >+	set_current_state(state);
> >	while (true) {
> >		if (rwsem_try_write_lock(count, sem))
> >			break;
> >@@ -486,21 +487,39 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
> >
> >		/* Block until there are no active lockers. */
> >		do {
> >+			if (signal_pending_state(state, current)) {
> 
>                           ^^ unlikely()?

The generated code is identical after I've added unlikely. I haven't
tried more gcc versions (mine is 5.3.1) but is this worth it?

> 
> >+				raw_spin_lock_irq(&sem->wait_lock);
> 
> If the lock is highly contended + a bad workload for spin-on-owner, this could take a while :)
> Of course, this is a side effect of the wait until no active lockers optimization which avoids
> the wait_lock in the first place, so fortunately it somewhat mitigates the situation.
> 
> >+				ret = ERR_PTR(-EINTR);
> >+				goto out;
> >+			}
> >			schedule();
> >-			set_current_state(TASK_UNINTERRUPTIBLE);
> >+			set_current_state(state);
> >		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
> >
> >		raw_spin_lock_irq(&sem->wait_lock);
> >	}
> >+out:
> >	__set_current_state(TASK_RUNNING);
> >-
> 
> You certainly don't want this iff exiting due to TASK_KILLABLE situation.

Not sure I got your point here.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-04-01 11:04 ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Michal Hocko
@ 2016-04-02  4:41   ` Davidlohr Bueso
  2016-04-04  9:17     ` Michal Hocko
  2016-05-10 10:43   ` Tetsuo Handa
  1 sibling, 1 reply; 52+ messages in thread
From: Davidlohr Bueso @ 2016-04-02  4:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Michal Hocko

On Fri, 01 Apr 2016, Michal Hocko wrote:

>From: Michal Hocko <mhocko@suse.com>
>
>Introduce a generic implementation necessary for down_write_killable.
>This is a trivial extension of the already existing down_write call
>which can be interrupted by SIGKILL.  This patch doesn't provide
>down_write_killable yet because arches have to provide the necessary
>pieces before.
>
>rwsem_down_write_failed which is a generic slow path for the
>write lock is extended to allow a task state and renamed to
>__rwsem_down_write_failed_state. The return value is either a valid
>semaphore pointer or ERR_PTR(-EINTR).
>
>rwsem_down_write_failed_killable is exported as a new way to wait for
>the lock and be killable.
>
>For rwsem-spinlock implementation the current __down_write it updated
>in a similar way as __rwsem_down_write_failed_state except it doesn't
>need new exports just visible __down_write_killable.
>
>Architectures which are not using the generic rwsem implementation are
>supposed to provide their __down_write_killable implementation and
>use rwsem_down_write_failed_killable for the slow path.

So in a nutshell, this is supposed to be the (writer) rwsem counterpart of
mutex_lock_killable() and down_killable(), right?

[...]

>--- a/kernel/locking/rwsem-xadd.c
>+++ b/kernel/locking/rwsem-xadd.c
>@@ -433,12 +433,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
> /*
>  * Wait until we successfully acquire the write lock
>  */
>-__visible
>-struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
>+static inline struct rw_semaphore *
>+__rwsem_down_write_failed_state(struct rw_semaphore *sem, int state)

fwiw I'm not a fan of the _state naming. While I understand why you chose it, I feel
it does not really describe the purpose of the call itself. The state logic alone is
really quite small and therefore should not govern the function name. Why not just apply
kiss and label things _common, ie like mutexes do? This would also standardize names a
bit.

> {
> 	long count;
> 	bool waiting = true; /* any queued threads before us */
> 	struct rwsem_waiter waiter;
>+	struct rw_semaphore *ret = sem;
>
> 	/* undo write bias from down_write operation, stop active locking */
> 	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
>@@ -478,7 +479,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
> 		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
>
> 	/* wait until we successfully acquire the lock */
>-	set_current_state(TASK_UNINTERRUPTIBLE);
>+	set_current_state(state);
> 	while (true) {
> 		if (rwsem_try_write_lock(count, sem))
> 			break;
>@@ -486,21 +487,39 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
>
> 		/* Block until there are no active lockers. */
> 		do {
>+			if (signal_pending_state(state, current)) {

                           ^^ unlikely()?

>+				raw_spin_lock_irq(&sem->wait_lock);

If the lock is highly contended + a bad workload for spin-on-owner, this could take a while :)
Of course, this is a side effect of the wait until no active lockers optimization which avoids
the wait_lock in the first place, so fortunately it somewhat mitigates the situation.

>+				ret = ERR_PTR(-EINTR);
>+				goto out;
>+			}
> 			schedule();
>-			set_current_state(TASK_UNINTERRUPTIBLE);
>+			set_current_state(state);
> 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
>
> 		raw_spin_lock_irq(&sem->wait_lock);
> 	}
>+out:
> 	__set_current_state(TASK_RUNNING);
>-

You certainly don't want this iff exiting due to TASK_KILLABLE situation.

> 	list_del(&waiter.list);
> 	raw_spin_unlock_irq(&sem->wait_lock);
>
>-	return sem;
>+	return ret;
>+}

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-04-01 11:04 [PATCH 0/11] introduce down_write_killable for rw_semaphore v2 Michal Hocko
@ 2016-04-01 11:04 ` Michal Hocko
  2016-04-02  4:41   ` Davidlohr Bueso
  2016-05-10 10:43   ` Tetsuo Handa
  0 siblings, 2 replies; 52+ messages in thread
From: Michal Hocko @ 2016-04-01 11:04 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce a generic implementation necessary for down_write_killable.
This is a trivial extension of the already existing down_write call
which can be interrupted by SIGKILL.  This patch doesn't provide
down_write_killable yet because arches have to provide the necessary
pieces before.

rwsem_down_write_failed which is a generic slow path for the
write lock is extended to allow a task state and renamed to
__rwsem_down_write_failed_state. The return value is either a valid
semaphore pointer or ERR_PTR(-EINTR).

rwsem_down_write_failed_killable is exported as a new way to wait for
the lock and be killable.

For rwsem-spinlock implementation the current __down_write it updated
in a similar way as __rwsem_down_write_failed_state except it doesn't
need new exports just visible __down_write_killable.

Architectures which are not using the generic rwsem implementation are
supposed to provide their __down_write_killable implementation and
use rwsem_down_write_failed_killable for the slow path.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/asm-generic/rwsem.h     | 12 ++++++++++++
 include/linux/rwsem-spinlock.h  |  1 +
 include/linux/rwsem.h           |  2 ++
 kernel/locking/rwsem-spinlock.c | 22 ++++++++++++++++++++--
 kernel/locking/rwsem-xadd.c     | 31 +++++++++++++++++++++++++------
 5 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index b8d8a6cf4ca8..3fc94a046bf5 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -63,6 +63,18 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	long tmp;
+
+	tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS,
+				     (atomic_long_t *)&sem->count);
+	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
diff --git a/include/linux/rwsem-spinlock.h b/include/linux/rwsem-spinlock.h
index a733a5467e6c..ae0528b834cd 100644
--- a/include/linux/rwsem-spinlock.h
+++ b/include/linux/rwsem-spinlock.h
@@ -34,6 +34,7 @@ struct rw_semaphore {
 extern void __down_read(struct rw_semaphore *sem);
 extern int __down_read_trylock(struct rw_semaphore *sem);
 extern void __down_write(struct rw_semaphore *sem);
+extern int __must_check __down_write_killable(struct rw_semaphore *sem);
 extern int __down_write_trylock(struct rw_semaphore *sem);
 extern void __up_read(struct rw_semaphore *sem);
 extern void __up_write(struct rw_semaphore *sem);
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 8f498cdde280..7d7ae029dac5 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -14,6 +14,7 @@
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/atomic.h>
+#include <linux/err.h>
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 #include <linux/osq_lock.h>
 #endif
@@ -43,6 +44,7 @@ struct rw_semaphore {
 
 extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index bab26104a5d0..fb2db7b408f0 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -191,11 +191,12 @@ int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * get a write lock on the semaphore
  */
-void __sched __down_write(struct rw_semaphore *sem)
+int __sched __down_write_state(struct rw_semaphore *sem, int state)
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
 	unsigned long flags;
+	int ret = 0;
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
@@ -215,16 +216,33 @@ void __sched __down_write(struct rw_semaphore *sem)
 		 */
 		if (sem->count == 0)
 			break;
-		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		if (signal_pending_state(state, current)) {
+			ret = -EINTR;
+			goto out;
+		}
+		set_task_state(tsk, state);
 		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 		schedule();
 		raw_spin_lock_irqsave(&sem->wait_lock, flags);
 	}
 	/* got the lock */
 	sem->count = -1;
+out:
 	list_del(&waiter.list);
 
 	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+
+	return ret;
+}
+
+void __sched __down_write(struct rw_semaphore *sem)
+{
+	__down_write_state(sem, TASK_UNINTERRUPTIBLE);
+}
+
+int __sched __down_write_killable(struct rw_semaphore *sem)
+{
+	return __down_write_state(sem, TASK_KILLABLE);
 }
 
 /*
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index a4d4de05b2d1..781b2628e41b 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -433,12 +433,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 /*
  * Wait until we successfully acquire the write lock
  */
-__visible
-struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
+static inline struct rw_semaphore *
+__rwsem_down_write_failed_state(struct rw_semaphore *sem, int state)
 {
 	long count;
 	bool waiting = true; /* any queued threads before us */
 	struct rwsem_waiter waiter;
+	struct rw_semaphore *ret = sem;
 
 	/* undo write bias from down_write operation, stop active locking */
 	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
@@ -478,7 +479,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
 
 	/* wait until we successfully acquire the lock */
-	set_current_state(TASK_UNINTERRUPTIBLE);
+	set_current_state(state);
 	while (true) {
 		if (rwsem_try_write_lock(count, sem))
 			break;
@@ -486,21 +487,39 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 
 		/* Block until there are no active lockers. */
 		do {
+			if (signal_pending_state(state, current)) {
+				raw_spin_lock_irq(&sem->wait_lock);
+				ret = ERR_PTR(-EINTR);
+				goto out;
+			}
 			schedule();
-			set_current_state(TASK_UNINTERRUPTIBLE);
+			set_current_state(state);
 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
+out:
 	__set_current_state(TASK_RUNNING);
-
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
 
-	return sem;
+	return ret;
+}
+
+__visible struct rw_semaphore * __sched
+rwsem_down_write_failed(struct rw_semaphore *sem)
+{
+	return __rwsem_down_write_failed_state(sem, TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(rwsem_down_write_failed);
 
+__visible struct rw_semaphore * __sched
+rwsem_down_write_failed_killable(struct rw_semaphore *sem)
+{
+	return __rwsem_down_write_failed_state(sem, TASK_KILLABLE);
+}
+EXPORT_SYMBOL(rwsem_down_write_failed_killable);
+
 /*
  * handle waking up a waiter on the semaphore
  * - up_read/up_write has decremented the active part of count if we come here
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2016-05-12 19:42 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-29 12:58 [PATCH 0/11] introduce down_write_killable for rw_semaphore Michal Hocko
2016-02-29 12:58 ` [PATCH 01/11] locking, rwsem: get rid of __down_write_nested Michal Hocko
2016-02-29 12:58 ` [PATCH 02/11] locking, rwsem: drop explicit memory barriers Michal Hocko
2016-02-29 12:58 ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Michal Hocko
2016-03-30 13:25   ` Peter Zijlstra
2016-03-31  8:33     ` Michal Hocko
2016-03-31  8:44       ` Peter Zijlstra
2016-03-31  8:55   ` [PATCH] " Michal Hocko
2016-02-29 12:58 ` [PATCH 04/11] alpha, rwsem: provide __down_write_killable Michal Hocko
2016-02-29 12:58 ` [PATCH 05/11] ia64, " Michal Hocko
2016-02-29 12:58 ` [PATCH 06/11] s390, " Michal Hocko
2016-02-29 12:58 ` [PATCH 07/11] sh, " Michal Hocko
2016-02-29 12:58 ` [PATCH 08/11] sparc, " Michal Hocko
2016-02-29 12:58 ` [PATCH 09/11] xtensa, " Michal Hocko
2016-02-29 12:58 ` [PATCH 10/11] x86, " Michal Hocko
2016-02-29 12:58 ` [PATCH 11/11] locking, rwsem: provide down_write_killable Michal Hocko
2016-03-30 13:32 ` [PATCH 0/11] introduce down_write_killable for rw_semaphore Peter Zijlstra
2016-03-31  8:59   ` Michal Hocko
2016-03-31  9:20     ` Ingo Molnar
2016-03-31 10:58       ` Michal Hocko
2016-03-31 17:03       ` Andrew Morton
2016-04-01  6:33         ` Ingo Molnar
2016-04-01  9:21           ` Michal Hocko
2016-04-01  9:50             ` Ingo Molnar
2016-04-01 10:52               ` Michal Hocko
2016-04-01  7:26         ` Michal Hocko
2016-04-01  9:11           ` Andrew Morton
2016-04-01 11:04 [PATCH 0/11] introduce down_write_killable for rw_semaphore v2 Michal Hocko
2016-04-01 11:04 ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Michal Hocko
2016-04-02  4:41   ` Davidlohr Bueso
2016-04-04  9:17     ` Michal Hocko
2016-04-04  9:21       ` Peter Zijlstra
2016-04-07  6:58       ` Davidlohr Bueso
2016-04-07  7:38         ` Michal Hocko
2016-05-10 10:43   ` Tetsuo Handa
2016-05-10 11:53     ` Michal Hocko
2016-05-10 12:38       ` Peter Zijlstra
2016-05-10 13:57         ` Tetsuo Handa
2016-05-11  7:23         ` Michal Hocko
2016-05-11  8:28           ` Michal Hocko
2016-05-11  8:44             ` Peter Zijlstra
2016-05-11  9:04               ` Michal Hocko
2016-05-11  9:17                 ` Peter Zijlstra
2016-05-11  9:31                   ` Michal Hocko
2016-05-11  9:41                     ` Peter Zijlstra
2016-05-11 13:59                       ` Michal Hocko
2016-05-11 18:03                         ` Michal Hocko
2016-05-12 12:12                           ` Peter Zijlstra
2016-05-12 12:19                             ` Michal Hocko
2016-05-12 13:58                               ` Peter Zijlstra
2016-05-12 19:42                               ` Waiman Long
2016-05-11  8:35           ` Peter Zijlstra
2016-05-11  9:02             ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).