linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/11] introduce down_write_killable for rw_semaphore v2
@ 2016-04-01 11:04 Michal Hocko
  2016-04-01 11:04 ` [PATCH 01/11] locking, rwsem: get rid of __down_write_nested Michal Hocko
                   ` (10 more replies)
  0 siblings, 11 replies; 51+ messages in thread
From: Michal Hocko @ 2016-04-01 11:04 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

Hi,
the following patchset implements a killable variant of write lock
for rw_semaphore. My usecase is to turn as many mmap_sem write users
to use a killable variant which will be helpful for the oom_reaper
merged in 4.6-rc1 (aac453635549 ("mm, oom: introduce oom reaper")) to
asynchronously tear down the oom victim address space which requires
mmap_sem for read. This will reduce a likelihood of OOM livelocks caused
by oom victim being stuck on a lock or other resource which prevents it
to reach its exit path and release the memory. I haven't implemented the
killable variant of the read lock because I do not have any usecase for
this API.

The previous version of this series was posted [1] and the only change
since than was the rebase on top of 4.5 (with a clean merge to 4.6-rc1) and
Patch3 was updated to only check for fatal signals when the caller would
block - aka trylock will get the lock even when signals are pending to
be in sync with mutex implementation as per Peter Zijlstra.

The patchset is organized as follows.
- Patch 1 is a trivial cleanup
- Patch 2, I belive, shouldn't introduce any functional changes as per
  Documentation/memory-barriers.txt.
- Patch 3 is the preparatory work and necessary infrastructure for
  down_write_killable. It implements generic __down_write_killable
  and prepares the write lock slow path to bail out earlier when told so
- Patch 4-1 are implementing arch specific __down_write_killable. One
  patch per architecture.
- finally patch 11 implements down_write_killable and ties everything
  together. I am not really an expert on lockdep so I hope I got it right.

Follow up patches to change mmap_sem write users to killable form is not
part of the series because that will be routed via mmotm tree later on.

I have tested on x86 with OOM situations with high mmap_sem contention
(basically many parallel page faults racing with many parallel mmap/munmap
tight loops) so the waiters for the write locks are routinely interrupted
by SIGKILL.

Thanks!

Shortlog says:
Michal Hocko (11):
      locking, rwsem: get rid of __down_write_nested
      locking, rwsem: drop explicit memory barriers
      locking, rwsem: introduce basis for down_write_killable
      alpha, rwsem: provide __down_write_killable
      ia64, rwsem: provide __down_write_killable
      s390, rwsem: provide __down_write_killable
      sh, rwsem: provide __down_write_killable
      sparc, rwsem: provide __down_write_killable
      xtensa, rwsem: provide __down_write_killable
      x86, rwsem: provide __down_write_killable
      locking, rwsem: provide down_write_killable

And diffstat:
 arch/alpha/include/asm/rwsem.h  | 18 ++++++++++++++++--
 arch/ia64/include/asm/rwsem.h   | 22 ++++++++++++++++++---
 arch/s390/include/asm/rwsem.h   | 18 ++++++++++++++----
 arch/sh/include/asm/rwsem.h     | 32 +++++++++++++++----------------
 arch/sparc/include/asm/rwsem.h  | 14 +++++++++++---
 arch/x86/include/asm/rwsem.h    | 42 +++++++++++++++++++++++++----------------
 arch/x86/lib/rwsem.S            |  8 ++++++++
 arch/xtensa/include/asm/rwsem.h | 27 ++++++++++++++------------
 include/asm-generic/rwsem.h     | 13 ++++++++++---
 include/linux/lockdep.h         | 15 +++++++++++++++
 include/linux/rwsem-spinlock.h  |  2 +-
 include/linux/rwsem.h           |  3 +++
 kernel/locking/rwsem-spinlock.c | 19 ++++++++++++++++---
 kernel/locking/rwsem-xadd.c     | 31 ++++++++++++++++++++++++------
 kernel/locking/rwsem.c          | 19 +++++++++++++++++++
 15 files changed, 213 insertions(+), 70 deletions(-)

---
[1] http://lkml.kernel.org/r/1456750705-7141-1-git-send-email-mhocko@kernel.org

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 01/11] locking, rwsem: get rid of __down_write_nested
  2016-04-01 11:04 [PATCH 0/11] introduce down_write_killable for rw_semaphore v2 Michal Hocko
@ 2016-04-01 11:04 ` Michal Hocko
  2016-04-02  0:28   ` Davidlohr Bueso
  2016-04-01 11:04 ` [PATCH 02/11] locking, rwsem: drop explicit memory barriers Michal Hocko
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Michal Hocko @ 2016-04-01 11:04 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

This is no longer used anywhere and all callers (__down_write) use
0 as a subclass. Ditch __down_write_nested to make the code easier
to follow.

This shouldn't introduce any functional change.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/include/asm/rwsem.h   | 7 +------
 arch/sh/include/asm/rwsem.h     | 5 -----
 arch/sparc/include/asm/rwsem.h  | 7 +------
 arch/x86/include/asm/rwsem.h    | 7 +------
 include/asm-generic/rwsem.h     | 7 +------
 include/linux/rwsem-spinlock.h  | 1 -
 kernel/locking/rwsem-spinlock.c | 7 +------
 7 files changed, 5 insertions(+), 36 deletions(-)

diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
index 4b43ee7e6776..8e52e72f3efc 100644
--- a/arch/s390/include/asm/rwsem.h
+++ b/arch/s390/include/asm/rwsem.h
@@ -90,7 +90,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	signed long old, new, tmp;
 
@@ -108,11 +108,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		rwsem_down_write_failed(sem);
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
index edab57265293..a5104bebd1eb 100644
--- a/arch/sh/include/asm/rwsem.h
+++ b/arch/sh/include/asm/rwsem.h
@@ -114,11 +114,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 		rwsem_downgrade_wake(sem);
 }
 
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
-{
-	__down_write(sem);
-}
-
 /*
  * implement exchange and add functionality
  */
diff --git a/arch/sparc/include/asm/rwsem.h b/arch/sparc/include/asm/rwsem.h
index 069bf4d663a1..e5a0d575bc7f 100644
--- a/arch/sparc/include/asm/rwsem.h
+++ b/arch/sparc/include/asm/rwsem.h
@@ -45,7 +45,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	long tmp;
 
@@ -55,11 +55,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		rwsem_down_write_failed(sem);
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index cad82c9c2fde..d79a218675bc 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -99,7 +99,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	long tmp;
 	asm volatile("# beginning down_write\n\t"
@@ -116,11 +116,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		     : "memory", "cc");
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index d6d5dc98d7da..b8d8a6cf4ca8 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -53,7 +53,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	long tmp;
 
@@ -63,11 +63,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		rwsem_down_write_failed(sem);
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
diff --git a/include/linux/rwsem-spinlock.h b/include/linux/rwsem-spinlock.h
index 561e8615528d..a733a5467e6c 100644
--- a/include/linux/rwsem-spinlock.h
+++ b/include/linux/rwsem-spinlock.h
@@ -34,7 +34,6 @@ struct rw_semaphore {
 extern void __down_read(struct rw_semaphore *sem);
 extern int __down_read_trylock(struct rw_semaphore *sem);
 extern void __down_write(struct rw_semaphore *sem);
-extern void __down_write_nested(struct rw_semaphore *sem, int subclass);
 extern int __down_write_trylock(struct rw_semaphore *sem);
 extern void __up_read(struct rw_semaphore *sem);
 extern void __up_write(struct rw_semaphore *sem);
diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index 3a5048572065..bab26104a5d0 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -191,7 +191,7 @@ int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * get a write lock on the semaphore
  */
-void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
+void __sched __down_write(struct rw_semaphore *sem)
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
@@ -227,11 +227,6 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 }
 
-void __sched __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 02/11] locking, rwsem: drop explicit memory barriers
  2016-04-01 11:04 [PATCH 0/11] introduce down_write_killable for rw_semaphore v2 Michal Hocko
  2016-04-01 11:04 ` [PATCH 01/11] locking, rwsem: get rid of __down_write_nested Michal Hocko
@ 2016-04-01 11:04 ` Michal Hocko
  2016-04-02  1:17   ` Davidlohr Bueso
  2016-04-01 11:04 ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Michal Hocko
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Michal Hocko @ 2016-04-01 11:04 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

sh and xtensa seem to be the only architectures which use explicit
memory barriers for rw_semaphore operations even though they are not
really needed because there is the full memory barrier is always implied
by atomic_{inc,dec,add,sub}_return resp. cmpxchg. Remove them.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/sh/include/asm/rwsem.h     | 14 ++------------
 arch/xtensa/include/asm/rwsem.h | 14 ++------------
 2 files changed, 4 insertions(+), 24 deletions(-)

diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
index a5104bebd1eb..f6c951c7a875 100644
--- a/arch/sh/include/asm/rwsem.h
+++ b/arch/sh/include/asm/rwsem.h
@@ -24,9 +24,7 @@
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	if (atomic_inc_return((atomic_t *)(&sem->count)) > 0)
-		smp_wmb();
-	else
+	if (atomic_inc_return((atomic_t *)(&sem->count)) <= 0)
 		rwsem_down_read_failed(sem);
 }
 
@@ -37,7 +35,6 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	while ((tmp = sem->count) >= 0) {
 		if (tmp == cmpxchg(&sem->count, tmp,
 				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
-			smp_wmb();
 			return 1;
 		}
 	}
@@ -53,9 +50,7 @@ static inline void __down_write(struct rw_semaphore *sem)
 
 	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
 				(atomic_t *)(&sem->count));
-	if (tmp == RWSEM_ACTIVE_WRITE_BIAS)
-		smp_wmb();
-	else
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
 		rwsem_down_write_failed(sem);
 }
 
@@ -65,7 +60,6 @@ static inline int __down_write_trylock(struct rw_semaphore *sem)
 
 	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
 		      RWSEM_ACTIVE_WRITE_BIAS);
-	smp_wmb();
 	return tmp == RWSEM_UNLOCKED_VALUE;
 }
 
@@ -76,7 +70,6 @@ static inline void __up_read(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_dec_return((atomic_t *)(&sem->count));
 	if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)
 		rwsem_wake(sem);
@@ -87,7 +80,6 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
-	smp_wmb();
 	if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
 			      (atomic_t *)(&sem->count)) < 0)
 		rwsem_wake(sem);
@@ -108,7 +100,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
 	if (tmp < 0)
 		rwsem_downgrade_wake(sem);
@@ -119,7 +110,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
  */
 static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
 {
-	smp_mb();
 	return atomic_add_return(delta, (atomic_t *)(&sem->count));
 }
 
diff --git a/arch/xtensa/include/asm/rwsem.h b/arch/xtensa/include/asm/rwsem.h
index 249619e7e7f2..593483f6e1ff 100644
--- a/arch/xtensa/include/asm/rwsem.h
+++ b/arch/xtensa/include/asm/rwsem.h
@@ -29,9 +29,7 @@
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	if (atomic_add_return(1,(atomic_t *)(&sem->count)) > 0)
-		smp_wmb();
-	else
+	if (atomic_add_return(1,(atomic_t *)(&sem->count)) <= 0)
 		rwsem_down_read_failed(sem);
 }
 
@@ -42,7 +40,6 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	while ((tmp = sem->count) >= 0) {
 		if (tmp == cmpxchg(&sem->count, tmp,
 				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
-			smp_wmb();
 			return 1;
 		}
 	}
@@ -58,9 +55,7 @@ static inline void __down_write(struct rw_semaphore *sem)
 
 	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
 				(atomic_t *)(&sem->count));
-	if (tmp == RWSEM_ACTIVE_WRITE_BIAS)
-		smp_wmb();
-	else
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
 		rwsem_down_write_failed(sem);
 }
 
@@ -70,7 +65,6 @@ static inline int __down_write_trylock(struct rw_semaphore *sem)
 
 	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
 		      RWSEM_ACTIVE_WRITE_BIAS);
-	smp_wmb();
 	return tmp == RWSEM_UNLOCKED_VALUE;
 }
 
@@ -81,7 +75,6 @@ static inline void __up_read(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_sub_return(1,(atomic_t *)(&sem->count));
 	if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)
 		rwsem_wake(sem);
@@ -92,7 +85,6 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
-	smp_wmb();
 	if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
 			      (atomic_t *)(&sem->count)) < 0)
 		rwsem_wake(sem);
@@ -113,7 +105,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
 	if (tmp < 0)
 		rwsem_downgrade_wake(sem);
@@ -124,7 +115,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
  */
 static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
 {
-	smp_mb();
 	return atomic_add_return(delta, (atomic_t *)(&sem->count));
 }
 
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-04-01 11:04 [PATCH 0/11] introduce down_write_killable for rw_semaphore v2 Michal Hocko
  2016-04-01 11:04 ` [PATCH 01/11] locking, rwsem: get rid of __down_write_nested Michal Hocko
  2016-04-01 11:04 ` [PATCH 02/11] locking, rwsem: drop explicit memory barriers Michal Hocko
@ 2016-04-01 11:04 ` Michal Hocko
  2016-04-02  4:41   ` Davidlohr Bueso
  2016-05-10 10:43   ` Tetsuo Handa
  2016-04-01 11:04 ` [PATCH 04/11] alpha, rwsem: provide __down_write_killable Michal Hocko
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 51+ messages in thread
From: Michal Hocko @ 2016-04-01 11:04 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce a generic implementation necessary for down_write_killable.
This is a trivial extension of the already existing down_write call
which can be interrupted by SIGKILL.  This patch doesn't provide
down_write_killable yet because arches have to provide the necessary
pieces before.

rwsem_down_write_failed which is a generic slow path for the
write lock is extended to allow a task state and renamed to
__rwsem_down_write_failed_state. The return value is either a valid
semaphore pointer or ERR_PTR(-EINTR).

rwsem_down_write_failed_killable is exported as a new way to wait for
the lock and be killable.

For rwsem-spinlock implementation the current __down_write it updated
in a similar way as __rwsem_down_write_failed_state except it doesn't
need new exports just visible __down_write_killable.

Architectures which are not using the generic rwsem implementation are
supposed to provide their __down_write_killable implementation and
use rwsem_down_write_failed_killable for the slow path.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/asm-generic/rwsem.h     | 12 ++++++++++++
 include/linux/rwsem-spinlock.h  |  1 +
 include/linux/rwsem.h           |  2 ++
 kernel/locking/rwsem-spinlock.c | 22 ++++++++++++++++++++--
 kernel/locking/rwsem-xadd.c     | 31 +++++++++++++++++++++++++------
 5 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index b8d8a6cf4ca8..3fc94a046bf5 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -63,6 +63,18 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	long tmp;
+
+	tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS,
+				     (atomic_long_t *)&sem->count);
+	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
diff --git a/include/linux/rwsem-spinlock.h b/include/linux/rwsem-spinlock.h
index a733a5467e6c..ae0528b834cd 100644
--- a/include/linux/rwsem-spinlock.h
+++ b/include/linux/rwsem-spinlock.h
@@ -34,6 +34,7 @@ struct rw_semaphore {
 extern void __down_read(struct rw_semaphore *sem);
 extern int __down_read_trylock(struct rw_semaphore *sem);
 extern void __down_write(struct rw_semaphore *sem);
+extern int __must_check __down_write_killable(struct rw_semaphore *sem);
 extern int __down_write_trylock(struct rw_semaphore *sem);
 extern void __up_read(struct rw_semaphore *sem);
 extern void __up_write(struct rw_semaphore *sem);
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 8f498cdde280..7d7ae029dac5 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -14,6 +14,7 @@
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/atomic.h>
+#include <linux/err.h>
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 #include <linux/osq_lock.h>
 #endif
@@ -43,6 +44,7 @@ struct rw_semaphore {
 
 extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index bab26104a5d0..fb2db7b408f0 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -191,11 +191,12 @@ int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * get a write lock on the semaphore
  */
-void __sched __down_write(struct rw_semaphore *sem)
+int __sched __down_write_state(struct rw_semaphore *sem, int state)
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
 	unsigned long flags;
+	int ret = 0;
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
@@ -215,16 +216,33 @@ void __sched __down_write(struct rw_semaphore *sem)
 		 */
 		if (sem->count == 0)
 			break;
-		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		if (signal_pending_state(state, current)) {
+			ret = -EINTR;
+			goto out;
+		}
+		set_task_state(tsk, state);
 		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 		schedule();
 		raw_spin_lock_irqsave(&sem->wait_lock, flags);
 	}
 	/* got the lock */
 	sem->count = -1;
+out:
 	list_del(&waiter.list);
 
 	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+
+	return ret;
+}
+
+void __sched __down_write(struct rw_semaphore *sem)
+{
+	__down_write_state(sem, TASK_UNINTERRUPTIBLE);
+}
+
+int __sched __down_write_killable(struct rw_semaphore *sem)
+{
+	return __down_write_state(sem, TASK_KILLABLE);
 }
 
 /*
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index a4d4de05b2d1..781b2628e41b 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -433,12 +433,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 /*
  * Wait until we successfully acquire the write lock
  */
-__visible
-struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
+static inline struct rw_semaphore *
+__rwsem_down_write_failed_state(struct rw_semaphore *sem, int state)
 {
 	long count;
 	bool waiting = true; /* any queued threads before us */
 	struct rwsem_waiter waiter;
+	struct rw_semaphore *ret = sem;
 
 	/* undo write bias from down_write operation, stop active locking */
 	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
@@ -478,7 +479,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
 
 	/* wait until we successfully acquire the lock */
-	set_current_state(TASK_UNINTERRUPTIBLE);
+	set_current_state(state);
 	while (true) {
 		if (rwsem_try_write_lock(count, sem))
 			break;
@@ -486,21 +487,39 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 
 		/* Block until there are no active lockers. */
 		do {
+			if (signal_pending_state(state, current)) {
+				raw_spin_lock_irq(&sem->wait_lock);
+				ret = ERR_PTR(-EINTR);
+				goto out;
+			}
 			schedule();
-			set_current_state(TASK_UNINTERRUPTIBLE);
+			set_current_state(state);
 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
+out:
 	__set_current_state(TASK_RUNNING);
-
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
 
-	return sem;
+	return ret;
+}
+
+__visible struct rw_semaphore * __sched
+rwsem_down_write_failed(struct rw_semaphore *sem)
+{
+	return __rwsem_down_write_failed_state(sem, TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(rwsem_down_write_failed);
 
+__visible struct rw_semaphore * __sched
+rwsem_down_write_failed_killable(struct rw_semaphore *sem)
+{
+	return __rwsem_down_write_failed_state(sem, TASK_KILLABLE);
+}
+EXPORT_SYMBOL(rwsem_down_write_failed_killable);
+
 /*
  * handle waking up a waiter on the semaphore
  * - up_read/up_write has decremented the active part of count if we come here
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 04/11] alpha, rwsem: provide __down_write_killable
  2016-04-01 11:04 [PATCH 0/11] introduce down_write_killable for rw_semaphore v2 Michal Hocko
                   ` (2 preceding siblings ...)
  2016-04-01 11:04 ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Michal Hocko
@ 2016-04-01 11:04 ` Michal Hocko
  2016-04-01 11:04 ` [PATCH 05/11] ia64, " Michal Hocko
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2016-04-01 11:04 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce ___down_write for the fast path and reuse it for __down_write
resp. __down_write_killable each using the respective generic slow path
(rwsem_down_write_failed resp. rwsem_down_write_failed_killable).

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/alpha/include/asm/rwsem.h | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h
index a83bbea62c67..0131a7058778 100644
--- a/arch/alpha/include/asm/rwsem.h
+++ b/arch/alpha/include/asm/rwsem.h
@@ -63,7 +63,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	return res >= 0 ? 1 : 0;
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
+static inline long ___down_write(struct rw_semaphore *sem)
 {
 	long oldcount;
 #ifndef	CONFIG_SMP
@@ -83,10 +83,24 @@ static inline void __down_write(struct rw_semaphore *sem)
 	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
 	:"Ir" (RWSEM_ACTIVE_WRITE_BIAS), "m" (sem->count) : "memory");
 #endif
-	if (unlikely(oldcount))
+	return oldcount;
+}
+
+static inline void __down_write(struct rw_semaphore *sem)
+{
+	if (unlikely(___down_write(sem)))
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (unlikely(___down_write(sem)))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 05/11] ia64, rwsem: provide __down_write_killable
  2016-04-01 11:04 [PATCH 0/11] introduce down_write_killable for rw_semaphore v2 Michal Hocko
                   ` (3 preceding siblings ...)
  2016-04-01 11:04 ` [PATCH 04/11] alpha, rwsem: provide __down_write_killable Michal Hocko
@ 2016-04-01 11:04 ` Michal Hocko
  2016-04-01 11:04 ` [PATCH 06/11] s390, " Michal Hocko
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2016-04-01 11:04 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce ___down_write for the fast path and reuse it for __down_write
resp. __down_write_killable each using the respective generic slow path
(rwsem_down_write_failed resp. rwsem_down_write_failed_killable).

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/ia64/include/asm/rwsem.h | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/ia64/include/asm/rwsem.h b/arch/ia64/include/asm/rwsem.h
index 3027e7516d85..5e78cb40d9df 100644
--- a/arch/ia64/include/asm/rwsem.h
+++ b/arch/ia64/include/asm/rwsem.h
@@ -49,8 +49,8 @@ __down_read (struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void
-__down_write (struct rw_semaphore *sem)
+static inline long
+___down_write (struct rw_semaphore *sem)
 {
 	long old, new;
 
@@ -59,10 +59,26 @@ __down_write (struct rw_semaphore *sem)
 		new = old + RWSEM_ACTIVE_WRITE_BIAS;
 	} while (cmpxchg_acq(&sem->count, old, new) != old);
 
-	if (old != 0)
+	return old;
+}
+
+static inline void
+__down_write (struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
 		rwsem_down_write_failed(sem);
 }
 
+static inline int
+__down_write_killable (struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 /*
  * unlock after reading
  */
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 06/11] s390, rwsem: provide __down_write_killable
  2016-04-01 11:04 [PATCH 0/11] introduce down_write_killable for rw_semaphore v2 Michal Hocko
                   ` (4 preceding siblings ...)
  2016-04-01 11:04 ` [PATCH 05/11] ia64, " Michal Hocko
@ 2016-04-01 11:04 ` Michal Hocko
  2016-04-01 11:04 ` [PATCH 07/11] sh, " Michal Hocko
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2016-04-01 11:04 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce ___down_write for the fast path and reuse it for __down_write
resp. __down_write_killable each using the respective generic slow path
(rwsem_down_write_failed resp. rwsem_down_write_failed_killable).

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/include/asm/rwsem.h | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
index 8e52e72f3efc..8e7b2b7e10f3 100644
--- a/arch/s390/include/asm/rwsem.h
+++ b/arch/s390/include/asm/rwsem.h
@@ -90,7 +90,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write(struct rw_semaphore *sem)
+static inline long ___down_write(struct rw_semaphore *sem)
 {
 	signed long old, new, tmp;
 
@@ -104,10 +104,25 @@ static inline void __down_write(struct rw_semaphore *sem)
 		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
 		: "Q" (sem->count), "m" (tmp)
 		: "cc", "memory");
-	if (old != 0)
+
+	return old;
+}
+
+static inline void __down_write(struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 07/11] sh, rwsem: provide __down_write_killable
  2016-04-01 11:04 [PATCH 0/11] introduce down_write_killable for rw_semaphore v2 Michal Hocko
                   ` (5 preceding siblings ...)
  2016-04-01 11:04 ` [PATCH 06/11] s390, " Michal Hocko
@ 2016-04-01 11:04 ` Michal Hocko
  2016-04-01 11:04 ` [PATCH 08/11] sparc, " Michal Hocko
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2016-04-01 11:04 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which uses the same fast path as __down_write except it falls back to
rwsem_down_write_failed_killable slow path and return -EINTR if killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/sh/include/asm/rwsem.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
index f6c951c7a875..8a457b83d2a5 100644
--- a/arch/sh/include/asm/rwsem.h
+++ b/arch/sh/include/asm/rwsem.h
@@ -54,6 +54,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				(atomic_t *)(&sem->count));
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	int tmp;
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 08/11] sparc, rwsem: provide __down_write_killable
  2016-04-01 11:04 [PATCH 0/11] introduce down_write_killable for rw_semaphore v2 Michal Hocko
                   ` (6 preceding siblings ...)
  2016-04-01 11:04 ` [PATCH 07/11] sh, " Michal Hocko
@ 2016-04-01 11:04 ` Michal Hocko
  2016-04-01 11:04 ` [PATCH 09/11] xtensa, " Michal Hocko
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2016-04-01 11:04 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which uses the same fast path as __down_write except it falls back to
rwsem_down_write_failed_killable slow path and return -EINTR if killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/sparc/include/asm/rwsem.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/sparc/include/asm/rwsem.h b/arch/sparc/include/asm/rwsem.h
index e5a0d575bc7f..c1e9e32e8b63 100644
--- a/arch/sparc/include/asm/rwsem.h
+++ b/arch/sparc/include/asm/rwsem.h
@@ -55,6 +55,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	long tmp;
+
+	tmp = atomic64_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				  (atomic64_t *)(&sem->count));
+	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 09/11] xtensa, rwsem: provide __down_write_killable
  2016-04-01 11:04 [PATCH 0/11] introduce down_write_killable for rw_semaphore v2 Michal Hocko
                   ` (7 preceding siblings ...)
  2016-04-01 11:04 ` [PATCH 08/11] sparc, " Michal Hocko
@ 2016-04-01 11:04 ` Michal Hocko
  2016-04-01 11:04 ` [PATCH 10/11] x86, " Michal Hocko
  2016-04-01 11:04 ` [PATCH 11/11] locking, rwsem: provide down_write_killable Michal Hocko
  10 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2016-04-01 11:04 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which uses the same fast path as __down_write except it falls back to
rwsem_down_write_failed_killable slow path and return -EINTR if killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/xtensa/include/asm/rwsem.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/xtensa/include/asm/rwsem.h b/arch/xtensa/include/asm/rwsem.h
index 593483f6e1ff..6283823b8040 100644
--- a/arch/xtensa/include/asm/rwsem.h
+++ b/arch/xtensa/include/asm/rwsem.h
@@ -59,6 +59,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				(atomic_t *)(&sem->count));
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	int tmp;
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 10/11] x86, rwsem: provide __down_write_killable
  2016-04-01 11:04 [PATCH 0/11] introduce down_write_killable for rw_semaphore v2 Michal Hocko
                   ` (8 preceding siblings ...)
  2016-04-01 11:04 ` [PATCH 09/11] xtensa, " Michal Hocko
@ 2016-04-01 11:04 ` Michal Hocko
  2016-04-06 18:31   ` Peter Zijlstra
  2016-04-01 11:04 ` [PATCH 11/11] locking, rwsem: provide down_write_killable Michal Hocko
  10 siblings, 1 reply; 51+ messages in thread
From: Michal Hocko @ 2016-04-01 11:04 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which uses the same fast path as __down_write except it falls back to
call_rwsem_down_write_failed_killable slow path and return -EINTR if
killed. To prevent from code duplication extract the skeleton of
__down_write into a helper macro which just takes the semaphore
and the slow path function to be called.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/include/asm/rwsem.h | 41 ++++++++++++++++++++++++++++-------------
 arch/x86/lib/rwsem.S         |  8 ++++++++
 2 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index d79a218675bc..4c3d90dbe89a 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -99,21 +99,36 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
+#define ____down_write(sem, slow_path)			\
+({							\
+	long tmp;					\
+	struct rw_semaphore* ret = sem;			\
+	asm volatile("# beginning down_write\n\t"	\
+		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"	\
+		     /* adds 0xffff0001, returns the old value */ \
+		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
+		     /* was the active mask 0 before? */\
+		     "  jz        1f\n"			\
+		     "  call " slow_path "\n"		\
+		     "1:\n"				\
+		     "# ending down_write"		\
+		     : "+m" (sem->count), "=d" (tmp), "+a" (ret)	\
+		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
+		     : "memory", "cc");			\
+	ret;						\
+})
+
 static inline void __down_write(struct rw_semaphore *sem)
 {
-	long tmp;
-	asm volatile("# beginning down_write\n\t"
-		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
-		     /* adds 0xffff0001, returns the old value */
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t"
-		     /* was the active mask 0 before? */
-		     "  jz        1f\n"
-		     "  call call_rwsem_down_write_failed\n"
-		     "1:\n"
-		     "# ending down_write"
-		     : "+m" (sem->count), "=d" (tmp)
-		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
+	____down_write(sem, "call_rwsem_down_write_failed");
+}
+
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (IS_ERR(____down_write(sem, "call_rwsem_down_write_failed_killable")))
+		return -EINTR;
+
+	return 0;
 }
 
 /*
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
index 40027db99140..d1a1397e1fb3 100644
--- a/arch/x86/lib/rwsem.S
+++ b/arch/x86/lib/rwsem.S
@@ -101,6 +101,14 @@ ENTRY(call_rwsem_down_write_failed)
 	ret
 ENDPROC(call_rwsem_down_write_failed)
 
+ENTRY(call_rwsem_down_write_failed_killable)
+	save_common_regs
+	movq %rax,%rdi
+	call rwsem_down_write_failed_killable
+	restore_common_regs
+	ret
+ENDPROC(call_rwsem_down_write_failed_killable)
+
 ENTRY(call_rwsem_wake)
 	/* do nothing if still outstanding active readers */
 	__ASM_HALF_SIZE(dec) %__ASM_HALF_REG(dx)
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 11/11] locking, rwsem: provide down_write_killable
  2016-04-01 11:04 [PATCH 0/11] introduce down_write_killable for rw_semaphore v2 Michal Hocko
                   ` (9 preceding siblings ...)
  2016-04-01 11:04 ` [PATCH 10/11] x86, " Michal Hocko
@ 2016-04-01 11:04 ` Michal Hocko
  10 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2016-04-01 11:04 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Now that all the architectures implement the necessary glue code
we can introduce down_write_killable. The only difference wrt. regular
down_write is that the slow path waits in TASK_KILLABLE state and the
interruption by the fatal signal is reported as -EINTR to the caller.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/lockdep.h | 15 +++++++++++++++
 include/linux/rwsem.h   |  1 +
 kernel/locking/rwsem.c  | 19 +++++++++++++++++++
 3 files changed, 35 insertions(+)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 4dca42fd32f5..13a80779b6da 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -446,6 +446,18 @@ do {								\
 	lock_acquired(&(_lock)->dep_map, _RET_IP_);			\
 } while (0)
 
+#define LOCK_CONTENDED_RETURN(_lock, try, lock)			\
+({								\
+	int ____err = 0;					\
+	if (!try(_lock)) {					\
+		lock_contended(&(_lock)->dep_map, _RET_IP_);	\
+		____err = lock(_lock);				\
+	}							\
+	if (!____err)						\
+		lock_acquired(&(_lock)->dep_map, _RET_IP_);	\
+	____err;						\
+})
+
 #else /* CONFIG_LOCK_STAT */
 
 #define lock_contended(lockdep_map, ip) do {} while (0)
@@ -454,6 +466,9 @@ do {								\
 #define LOCK_CONTENDED(_lock, try, lock) \
 	lock(_lock)
 
+#define LOCK_CONTENDED_RETURN(_lock, try, lock) \
+	lock(_lock)
+
 #endif /* CONFIG_LOCK_STAT */
 
 #ifdef CONFIG_LOCKDEP
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 7d7ae029dac5..d1c12d160ace 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -118,6 +118,7 @@ extern int down_read_trylock(struct rw_semaphore *sem);
  * lock for writing
  */
 extern void down_write(struct rw_semaphore *sem);
+extern int __must_check down_write_killable(struct rw_semaphore *sem);
 
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 205be0ce34de..c817216c1615 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -55,6 +55,25 @@ void __sched down_write(struct rw_semaphore *sem)
 EXPORT_SYMBOL(down_write);
 
 /*
+ * lock for writing
+ */
+int __sched down_write_killable(struct rw_semaphore *sem)
+{
+	might_sleep();
+	rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
+
+	if (LOCK_CONTENDED_RETURN(sem, __down_write_trylock, __down_write_killable)) {
+		rwsem_release(&sem->dep_map, 1, _RET_IP_);
+		return -EINTR;
+	}
+
+	rwsem_set_owner(sem);
+	return 0;
+}
+
+EXPORT_SYMBOL(down_write_killable);
+
+/*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
 int down_write_trylock(struct rw_semaphore *sem)
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/11] locking, rwsem: get rid of __down_write_nested
  2016-04-01 11:04 ` [PATCH 01/11] locking, rwsem: get rid of __down_write_nested Michal Hocko
@ 2016-04-02  0:28   ` Davidlohr Bueso
  0 siblings, 0 replies; 51+ messages in thread
From: Davidlohr Bueso @ 2016-04-02  0:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Michal Hocko

On Fri, 01 Apr 2016, Michal Hocko wrote:

>From: Michal Hocko <mhocko@suse.com>
>
>This is no longer used anywhere and all callers (__down_write) use
>0 as a subclass. Ditch __down_write_nested to make the code easier
>to follow.
>
>This shouldn't introduce any functional change.
>
>Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Davidlohr Bueso <dave@stgolabs.net>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 02/11] locking, rwsem: drop explicit memory barriers
  2016-04-01 11:04 ` [PATCH 02/11] locking, rwsem: drop explicit memory barriers Michal Hocko
@ 2016-04-02  1:17   ` Davidlohr Bueso
  2016-04-04  9:03     ` Michal Hocko
  2016-04-06  9:06     ` [PATCH] sparc, " Michal Hocko
  0 siblings, 2 replies; 51+ messages in thread
From: Davidlohr Bueso @ 2016-04-02  1:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Michal Hocko

On Fri, 01 Apr 2016, Michal Hocko wrote:

>From: Michal Hocko <mhocko@suse.com>
>
>sh and xtensa seem to be the only architectures which use explicit
>memory barriers for rw_semaphore operations even though they are not
>really needed because there is the full memory barrier is always implied
>by atomic_{inc,dec,add,sub}_return resp. cmpxchg. Remove them.

Heh, and sh even defines smp_store_mb() with xchg(), which makes the above
even more so.

>
>Signed-off-by: Michal Hocko <mhocko@suse.com>
>---
> arch/sh/include/asm/rwsem.h     | 14 ++------------
> arch/xtensa/include/asm/rwsem.h | 14 ++------------
> 2 files changed, 4 insertions(+), 24 deletions(-)

I think we can actually get rid of these files altogether. While I don't know the archs,
it does not appear to be implementing any kind of workaround/optimization, which is obviously
the whole purpose of defining per-arch rwsem internals, unless the redundant barriers are
actually the purpose. So we could use the generic rwsem.h (which has optimizations and ACQUIRE/
RELEASE semantics, although nops for either sh or xtensa specifically). A first inspection shows
that the code is in fact the same and continue to use 32bit rwsems.

Thanks,
Davidlohr


>diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
>index a5104bebd1eb..f6c951c7a875 100644
>--- a/arch/sh/include/asm/rwsem.h
>+++ b/arch/sh/include/asm/rwsem.h
>@@ -24,9 +24,7 @@
>  */
> static inline void __down_read(struct rw_semaphore *sem)
> {
>-	if (atomic_inc_return((atomic_t *)(&sem->count)) > 0)
>-		smp_wmb();
>-	else
>+	if (atomic_inc_return((atomic_t *)(&sem->count)) <= 0)
> 		rwsem_down_read_failed(sem);
> }
>
>@@ -37,7 +35,6 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
> 	while ((tmp = sem->count) >= 0) {
> 		if (tmp == cmpxchg(&sem->count, tmp,
> 				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
>-			smp_wmb();
> 			return 1;
> 		}
> 	}
>@@ -53,9 +50,7 @@ static inline void __down_write(struct rw_semaphore *sem)
>
> 	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
> 				(atomic_t *)(&sem->count));
>-	if (tmp == RWSEM_ACTIVE_WRITE_BIAS)
>-		smp_wmb();
>-	else
>+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
> 		rwsem_down_write_failed(sem);
> }
>
>@@ -65,7 +60,6 @@ static inline int __down_write_trylock(struct rw_semaphore *sem)
>
> 	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
> 		      RWSEM_ACTIVE_WRITE_BIAS);
>-	smp_wmb();
> 	return tmp == RWSEM_UNLOCKED_VALUE;
> }
>
>@@ -76,7 +70,6 @@ static inline void __up_read(struct rw_semaphore *sem)
> {
> 	int tmp;
>
>-	smp_wmb();
> 	tmp = atomic_dec_return((atomic_t *)(&sem->count));
> 	if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)
> 		rwsem_wake(sem);
>@@ -87,7 +80,6 @@ static inline void __up_read(struct rw_semaphore *sem)
>  */
> static inline void __up_write(struct rw_semaphore *sem)
> {
>-	smp_wmb();
> 	if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
> 			      (atomic_t *)(&sem->count)) < 0)
> 		rwsem_wake(sem);
>@@ -108,7 +100,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
> {
> 	int tmp;
>
>-	smp_wmb();
> 	tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
> 	if (tmp < 0)
> 		rwsem_downgrade_wake(sem);
>@@ -119,7 +110,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
>  */
> static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
> {
>-	smp_mb();
> 	return atomic_add_return(delta, (atomic_t *)(&sem->count));
> }
>
>diff --git a/arch/xtensa/include/asm/rwsem.h b/arch/xtensa/include/asm/rwsem.h
>index 249619e7e7f2..593483f6e1ff 100644
>--- a/arch/xtensa/include/asm/rwsem.h
>+++ b/arch/xtensa/include/asm/rwsem.h
>@@ -29,9 +29,7 @@
>  */
> static inline void __down_read(struct rw_semaphore *sem)
> {
>-	if (atomic_add_return(1,(atomic_t *)(&sem->count)) > 0)
>-		smp_wmb();
>-	else
>+	if (atomic_add_return(1,(atomic_t *)(&sem->count)) <= 0)
> 		rwsem_down_read_failed(sem);
> }
>
>@@ -42,7 +40,6 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
> 	while ((tmp = sem->count) >= 0) {
> 		if (tmp == cmpxchg(&sem->count, tmp,
> 				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
>-			smp_wmb();
> 			return 1;
> 		}
> 	}
>@@ -58,9 +55,7 @@ static inline void __down_write(struct rw_semaphore *sem)
>
> 	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
> 				(atomic_t *)(&sem->count));
>-	if (tmp == RWSEM_ACTIVE_WRITE_BIAS)
>-		smp_wmb();
>-	else
>+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
> 		rwsem_down_write_failed(sem);
> }
>
>@@ -70,7 +65,6 @@ static inline int __down_write_trylock(struct rw_semaphore *sem)
>
> 	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
> 		      RWSEM_ACTIVE_WRITE_BIAS);
>-	smp_wmb();
> 	return tmp == RWSEM_UNLOCKED_VALUE;
> }
>
>@@ -81,7 +75,6 @@ static inline void __up_read(struct rw_semaphore *sem)
> {
> 	int tmp;
>
>-	smp_wmb();
> 	tmp = atomic_sub_return(1,(atomic_t *)(&sem->count));
> 	if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)
> 		rwsem_wake(sem);
>@@ -92,7 +85,6 @@ static inline void __up_read(struct rw_semaphore *sem)
>  */
> static inline void __up_write(struct rw_semaphore *sem)
> {
>-	smp_wmb();
> 	if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
> 			      (atomic_t *)(&sem->count)) < 0)
> 		rwsem_wake(sem);
>@@ -113,7 +105,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
> {
> 	int tmp;
>
>-	smp_wmb();
> 	tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
> 	if (tmp < 0)
> 		rwsem_downgrade_wake(sem);
>@@ -124,7 +115,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
>  */
> static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
> {
>-	smp_mb();
> 	return atomic_add_return(delta, (atomic_t *)(&sem->count));
> }
>
>-- 
>2.8.0.rc3
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-04-01 11:04 ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Michal Hocko
@ 2016-04-02  4:41   ` Davidlohr Bueso
  2016-04-04  9:17     ` Michal Hocko
  2016-05-10 10:43   ` Tetsuo Handa
  1 sibling, 1 reply; 51+ messages in thread
From: Davidlohr Bueso @ 2016-04-02  4:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Michal Hocko

On Fri, 01 Apr 2016, Michal Hocko wrote:

>From: Michal Hocko <mhocko@suse.com>
>
>Introduce a generic implementation necessary for down_write_killable.
>This is a trivial extension of the already existing down_write call
>which can be interrupted by SIGKILL.  This patch doesn't provide
>down_write_killable yet because arches have to provide the necessary
>pieces before.
>
>rwsem_down_write_failed which is a generic slow path for the
>write lock is extended to allow a task state and renamed to
>__rwsem_down_write_failed_state. The return value is either a valid
>semaphore pointer or ERR_PTR(-EINTR).
>
>rwsem_down_write_failed_killable is exported as a new way to wait for
>the lock and be killable.
>
>For rwsem-spinlock implementation the current __down_write it updated
>in a similar way as __rwsem_down_write_failed_state except it doesn't
>need new exports just visible __down_write_killable.
>
>Architectures which are not using the generic rwsem implementation are
>supposed to provide their __down_write_killable implementation and
>use rwsem_down_write_failed_killable for the slow path.

So in a nutshell, this is supposed to be the (writer) rwsem counterpart of
mutex_lock_killable() and down_killable(), right?

[...]

>--- a/kernel/locking/rwsem-xadd.c
>+++ b/kernel/locking/rwsem-xadd.c
>@@ -433,12 +433,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
> /*
>  * Wait until we successfully acquire the write lock
>  */
>-__visible
>-struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
>+static inline struct rw_semaphore *
>+__rwsem_down_write_failed_state(struct rw_semaphore *sem, int state)

fwiw I'm not a fan of the _state naming. While I understand why you chose it, I feel
it does not really describe the purpose of the call itself. The state logic alone is
really quite small and therefore should not govern the function name. Why not just apply
kiss and label things _common, ie like mutexes do? This would also standardize names a
bit.

> {
> 	long count;
> 	bool waiting = true; /* any queued threads before us */
> 	struct rwsem_waiter waiter;
>+	struct rw_semaphore *ret = sem;
>
> 	/* undo write bias from down_write operation, stop active locking */
> 	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
>@@ -478,7 +479,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
> 		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
>
> 	/* wait until we successfully acquire the lock */
>-	set_current_state(TASK_UNINTERRUPTIBLE);
>+	set_current_state(state);
> 	while (true) {
> 		if (rwsem_try_write_lock(count, sem))
> 			break;
>@@ -486,21 +487,39 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
>
> 		/* Block until there are no active lockers. */
> 		do {
>+			if (signal_pending_state(state, current)) {

                           ^^ unlikely()?

>+				raw_spin_lock_irq(&sem->wait_lock);

If the lock is highly contended + a bad workload for spin-on-owner, this could take a while :)
Of course, this is a side effect of the wait until no active lockers optimization which avoids
the wait_lock in the first place, so fortunately it somewhat mitigates the situation.

>+				ret = ERR_PTR(-EINTR);
>+				goto out;
>+			}
> 			schedule();
>-			set_current_state(TASK_UNINTERRUPTIBLE);
>+			set_current_state(state);
> 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
>
> 		raw_spin_lock_irq(&sem->wait_lock);
> 	}
>+out:
> 	__set_current_state(TASK_RUNNING);
>-

You certainly don't want this iff exiting due to TASK_KILLABLE situation.

> 	list_del(&waiter.list);
> 	raw_spin_unlock_irq(&sem->wait_lock);
>
>-	return sem;
>+	return ret;
>+}

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 02/11] locking, rwsem: drop explicit memory barriers
  2016-04-02  1:17   ` Davidlohr Bueso
@ 2016-04-04  9:03     ` Michal Hocko
  2016-04-04  9:06       ` [PATCH 1/2] xtensa, rwsem: drop superfluous arch specific implementation Michal Hocko
  2016-04-06  9:06     ` [PATCH] sparc, " Michal Hocko
  1 sibling, 1 reply; 51+ messages in thread
From: Michal Hocko @ 2016-04-04  9:03 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch

On Fri 01-04-16 18:17:53, Davidlohr Bueso wrote:
[...]
> I think we can actually get rid of these files altogether. While I
> don't know the archs, it does not appear to be implementing any kind
> of workaround/optimization, which is obviously the whole purpose of
> defining per-arch rwsem internals, unless the redundant barriers are
> actually the purpose. So we could use the generic rwsem.h (which has
> optimizations and ACQUIRE/ RELEASE semantics, although nops for either
> sh or xtensa specifically). A first inspection shows that the code is
> in fact the same and continue to use 32bit rwsems.

OK, so this has passed my defconfig and allnoconfig for xtensa compile
test, allyesconfig has failed due to some unrelated reasons:
clk-qoriq.c:(.init.text+0x3ebd3): dangerous relocation: l32r: literal target out of range (try using text-section-literals): (.init.literal+0x17c60)
and zillions of similar... The same error is without the patch

I do not have a crosscompiler[1] for sh arch so I couldn't have tested
it but there shouldn't be any issues in principal. I will send two
patches as a follow up.

---
[1] https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/ doesn't
seem to have it.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/2] xtensa, rwsem: drop superfluous arch specific implementation
  2016-04-04  9:03     ` Michal Hocko
@ 2016-04-04  9:06       ` Michal Hocko
  2016-04-04  9:06         ` [PATCH 2/2] sh, " Michal Hocko
  2016-04-04 10:23         ` [PATCH 1/2] xtensa, " Max Filippov
  0 siblings, 2 replies; 51+ messages in thread
From: Michal Hocko @ 2016-04-04  9:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

since "locking, rwsem: drop explicit memory barriers" the arch specific
code is basically same as the the generic one so we can drop the
superfluous code.

Suggested-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/xtensa/include/asm/Kbuild  |   1 +
 arch/xtensa/include/asm/rwsem.h | 134 ----------------------------------------
 2 files changed, 1 insertion(+), 134 deletions(-)
 delete mode 100644 arch/xtensa/include/asm/rwsem.h

diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index b56855a1382a..28cf4c5d65ef 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -22,6 +22,7 @@ generic-y += mm-arch-hooks.h
 generic-y += percpu.h
 generic-y += preempt.h
 generic-y += resource.h
+generic-y += rwsem.h
 generic-y += sections.h
 generic-y += siginfo.h
 generic-y += statfs.h
diff --git a/arch/xtensa/include/asm/rwsem.h b/arch/xtensa/include/asm/rwsem.h
deleted file mode 100644
index 6283823b8040..000000000000
--- a/arch/xtensa/include/asm/rwsem.h
+++ /dev/null
@@ -1,134 +0,0 @@
-/*
- * include/asm-xtensa/rwsem.h
- *
- * This file is subject to the terms and conditions of the GNU General Public
- * License.  See the file "COPYING" in the main directory of this archive
- * for more details.
- *
- * Largely copied from include/asm-ppc/rwsem.h
- *
- * Copyright (C) 2001 - 2005 Tensilica Inc.
- */
-
-#ifndef _XTENSA_RWSEM_H
-#define _XTENSA_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "Please don't include <asm/rwsem.h> directly, use <linux/rwsem.h> instead."
-#endif
-
-#define RWSEM_UNLOCKED_VALUE		0x00000000
-#define RWSEM_ACTIVE_BIAS		0x00000001
-#define RWSEM_ACTIVE_MASK		0x0000ffff
-#define RWSEM_WAITING_BIAS		(-0x00010000)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	if (atomic_add_return(1,(atomic_t *)(&sem->count)) <= 0)
-		rwsem_down_read_failed(sem);
-}
-
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
-	int tmp;
-
-	while ((tmp = sem->count) >= 0) {
-		if (tmp == cmpxchg(&sem->count, tmp,
-				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
-			return 1;
-		}
-	}
-	return 0;
-}
-
-/*
- * lock for writing
- */
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	int tmp;
-
-	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
-				(atomic_t *)(&sem->count));
-	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
-		rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	int tmp;
-
-	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
-				(atomic_t *)(&sem->count));
-	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
-		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
-			return -EINTR;
-
-	return 0;
-}
-
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
-	int tmp;
-
-	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
-		      RWSEM_ACTIVE_WRITE_BIAS);
-	return tmp == RWSEM_UNLOCKED_VALUE;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	int tmp;
-
-	tmp = atomic_sub_return(1,(atomic_t *)(&sem->count));
-	if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)
-		rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
-			      (atomic_t *)(&sem->count)) < 0)
-		rwsem_wake(sem);
-}
-
-/*
- * implement atomic add functionality
- */
-static inline void rwsem_atomic_add(int delta, struct rw_semaphore *sem)
-{
-	atomic_add(delta, (atomic_t *)(&sem->count));
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	int tmp;
-
-	tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
-	if (tmp < 0)
-		rwsem_downgrade_wake(sem);
-}
-
-/*
- * implement exchange and add functionality
- */
-static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
-{
-	return atomic_add_return(delta, (atomic_t *)(&sem->count));
-}
-
-#endif	/* _XTENSA_RWSEM_H */
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 2/2] sh, rwsem: drop superfluous arch specific implementation
  2016-04-04  9:06       ` [PATCH 1/2] xtensa, rwsem: drop superfluous arch specific implementation Michal Hocko
@ 2016-04-04  9:06         ` Michal Hocko
  2016-04-06  9:26           ` Peter Zijlstra
  2016-04-04 10:23         ` [PATCH 1/2] xtensa, " Max Filippov
  1 sibling, 1 reply; 51+ messages in thread
From: Michal Hocko @ 2016-04-04  9:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

since "locking, rwsem: drop explicit memory barriers" the arch specific
code is basically same as the the generic one so we can drop the
superfluous code.

Suggested-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/sh/include/asm/Kbuild  |   1 +
 arch/sh/include/asm/rwsem.h | 130 --------------------------------------------
 2 files changed, 1 insertion(+), 130 deletions(-)
 delete mode 100644 arch/sh/include/asm/rwsem.h

diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index aac452b26aa8..6d866c7a9b74 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -25,6 +25,7 @@ generic-y += percpu.h
 generic-y += poll.h
 generic-y += preempt.h
 generic-y += resource.h
+generic-y += rwsem.h
 generic-y += sembuf.h
 generic-y += serial.h
 generic-y += shmbuf.h
diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
deleted file mode 100644
index 8a457b83d2a5..000000000000
--- a/arch/sh/include/asm/rwsem.h
+++ /dev/null
@@ -1,130 +0,0 @@
-/*
- * include/asm-sh/rwsem.h: R/W semaphores for SH using the stuff
- * in lib/rwsem.c.
- */
-
-#ifndef _ASM_SH_RWSEM_H
-#define _ASM_SH_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-
-#define RWSEM_UNLOCKED_VALUE		0x00000000
-#define RWSEM_ACTIVE_BIAS		0x00000001
-#define RWSEM_ACTIVE_MASK		0x0000ffff
-#define RWSEM_WAITING_BIAS		(-0x00010000)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	if (atomic_inc_return((atomic_t *)(&sem->count)) <= 0)
-		rwsem_down_read_failed(sem);
-}
-
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
-	int tmp;
-
-	while ((tmp = sem->count) >= 0) {
-		if (tmp == cmpxchg(&sem->count, tmp,
-				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
-			return 1;
-		}
-	}
-	return 0;
-}
-
-/*
- * lock for writing
- */
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	int tmp;
-
-	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
-				(atomic_t *)(&sem->count));
-	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
-		rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	int tmp;
-
-	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
-				(atomic_t *)(&sem->count));
-	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
-		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
-			return -EINTR;
-
-	return 0;
-}
-
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
-	int tmp;
-
-	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
-		      RWSEM_ACTIVE_WRITE_BIAS);
-	return tmp == RWSEM_UNLOCKED_VALUE;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	int tmp;
-
-	tmp = atomic_dec_return((atomic_t *)(&sem->count));
-	if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)
-		rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
-			      (atomic_t *)(&sem->count)) < 0)
-		rwsem_wake(sem);
-}
-
-/*
- * implement atomic add functionality
- */
-static inline void rwsem_atomic_add(int delta, struct rw_semaphore *sem)
-{
-	atomic_add(delta, (atomic_t *)(&sem->count));
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	int tmp;
-
-	tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
-	if (tmp < 0)
-		rwsem_downgrade_wake(sem);
-}
-
-/*
- * implement exchange and add functionality
- */
-static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
-{
-	return atomic_add_return(delta, (atomic_t *)(&sem->count));
-}
-
-#endif /* __KERNEL__ */
-#endif /* _ASM_SH_RWSEM_H */
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-04-02  4:41   ` Davidlohr Bueso
@ 2016-04-04  9:17     ` Michal Hocko
  2016-04-04  9:21       ` Peter Zijlstra
  2016-04-07  6:58       ` Davidlohr Bueso
  0 siblings, 2 replies; 51+ messages in thread
From: Michal Hocko @ 2016-04-04  9:17 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch

On Fri 01-04-16 21:41:25, Davidlohr Bueso wrote:
> On Fri, 01 Apr 2016, Michal Hocko wrote:
> 
> >From: Michal Hocko <mhocko@suse.com>
> >
> >Introduce a generic implementation necessary for down_write_killable.
> >This is a trivial extension of the already existing down_write call
> >which can be interrupted by SIGKILL.  This patch doesn't provide
> >down_write_killable yet because arches have to provide the necessary
> >pieces before.
> >
> >rwsem_down_write_failed which is a generic slow path for the
> >write lock is extended to allow a task state and renamed to
> >__rwsem_down_write_failed_state. The return value is either a valid
> >semaphore pointer or ERR_PTR(-EINTR).
> >
> >rwsem_down_write_failed_killable is exported as a new way to wait for
> >the lock and be killable.
> >
> >For rwsem-spinlock implementation the current __down_write it updated
> >in a similar way as __rwsem_down_write_failed_state except it doesn't
> >need new exports just visible __down_write_killable.
> >
> >Architectures which are not using the generic rwsem implementation are
> >supposed to provide their __down_write_killable implementation and
> >use rwsem_down_write_failed_killable for the slow path.
> 
> So in a nutshell, this is supposed to be the (writer) rwsem counterpart of
> mutex_lock_killable() and down_killable(), right?

Yes.

> [...]
> 
> >--- a/kernel/locking/rwsem-xadd.c
> >+++ b/kernel/locking/rwsem-xadd.c
> >@@ -433,12 +433,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
> >/*
> > * Wait until we successfully acquire the write lock
> > */
> >-__visible
> >-struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
> >+static inline struct rw_semaphore *
> >+__rwsem_down_write_failed_state(struct rw_semaphore *sem, int state)
> 
> fwiw I'm not a fan of the _state naming. While I understand why you chose it, I feel
> it does not really describe the purpose of the call itself. The state logic alone is
> really quite small and therefore should not govern the function name. Why not just apply
> kiss and label things _common, ie like mutexes do? This would also standardize names a
> bit.

I really do not care much about naming. So if _common sounds better I
can certainly rename.

> 
> >{
> >	long count;
> >	bool waiting = true; /* any queued threads before us */
> >	struct rwsem_waiter waiter;
> >+	struct rw_semaphore *ret = sem;
> >
> >	/* undo write bias from down_write operation, stop active locking */
> >	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
> >@@ -478,7 +479,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
> >		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
> >
> >	/* wait until we successfully acquire the lock */
> >-	set_current_state(TASK_UNINTERRUPTIBLE);
> >+	set_current_state(state);
> >	while (true) {
> >		if (rwsem_try_write_lock(count, sem))
> >			break;
> >@@ -486,21 +487,39 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
> >
> >		/* Block until there are no active lockers. */
> >		do {
> >+			if (signal_pending_state(state, current)) {
> 
>                           ^^ unlikely()?

The generated code is identical after I've added unlikely. I haven't
tried more gcc versions (mine is 5.3.1) but is this worth it?

> 
> >+				raw_spin_lock_irq(&sem->wait_lock);
> 
> If the lock is highly contended + a bad workload for spin-on-owner, this could take a while :)
> Of course, this is a side effect of the wait until no active lockers optimization which avoids
> the wait_lock in the first place, so fortunately it somewhat mitigates the situation.
> 
> >+				ret = ERR_PTR(-EINTR);
> >+				goto out;
> >+			}
> >			schedule();
> >-			set_current_state(TASK_UNINTERRUPTIBLE);
> >+			set_current_state(state);
> >		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
> >
> >		raw_spin_lock_irq(&sem->wait_lock);
> >	}
> >+out:
> >	__set_current_state(TASK_RUNNING);
> >-
> 
> You certainly don't want this iff exiting due to TASK_KILLABLE situation.

Not sure I got your point here.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-04-04  9:17     ` Michal Hocko
@ 2016-04-04  9:21       ` Peter Zijlstra
  2016-04-07  6:58       ` Davidlohr Bueso
  1 sibling, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2016-04-04  9:21 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Davidlohr Bueso, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch

On Mon, Apr 04, 2016 at 11:17:00AM +0200, Michal Hocko wrote:
> > >@@ -486,21 +487,39 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
> > >
> > >		/* Block until there are no active lockers. */
> > >		do {
> > >+			if (signal_pending_state(state, current)) {
> > 
> >                           ^^ unlikely()?
> 
> The generated code is identical after I've added unlikely. I haven't
> tried more gcc versions (mine is 5.3.1) but is this worth it?

Both signal_pending() and __fatal_signal_pending() already have an
unlikely(), which is why adding it here is superfluous.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] xtensa, rwsem: drop superfluous arch specific implementation
  2016-04-04  9:06       ` [PATCH 1/2] xtensa, rwsem: drop superfluous arch specific implementation Michal Hocko
  2016-04-04  9:06         ` [PATCH 2/2] sh, " Michal Hocko
@ 2016-04-04 10:23         ` Max Filippov
  1 sibling, 0 replies; 51+ messages in thread
From: Max Filippov @ 2016-04-04 10:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, maintainer:X86 ARCHITECTURE...,
	open list:ALPHA PORT, open list:IA64 (Itanium) PL...,
	linux-s390, open list:SUPERH, open list:SPARC + UltraSPAR...,
	linux-xtensa, Linux-Arch, Michal Hocko

On Mon, Apr 4, 2016 at 12:06 PM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> since "locking, rwsem: drop explicit memory barriers" the arch specific
> code is basically same as the the generic one so we can drop the
> superfluous code.
>
> Suggested-by: Davidlohr Bueso <dave@stgolabs.net>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/xtensa/include/asm/Kbuild  |   1 +
>  arch/xtensa/include/asm/rwsem.h | 134 ----------------------------------------
>  2 files changed, 1 insertion(+), 134 deletions(-)
>  delete mode 100644 arch/xtensa/include/asm/rwsem.h

Acked-by: Max Filippov <jcmvbkbc@gmail.com>

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH] sparc, rwsem: drop superfluous arch specific implementation
  2016-04-02  1:17   ` Davidlohr Bueso
  2016-04-04  9:03     ` Michal Hocko
@ 2016-04-06  9:06     ` Michal Hocko
  1 sibling, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2016-04-06  9:06 UTC (permalink / raw)
  To: David S. Miller; +Cc: Davidlohr Bueso, LKML, sparclinux, linux-arch

[Redicing CC list]

On Fri 01-04-16 18:17:53, Davidlohr Bueso wrote:
> On Fri, 01 Apr 2016, Michal Hocko wrote:
[...]
> >arch/sh/include/asm/rwsem.h     | 14 ++------------
> >arch/xtensa/include/asm/rwsem.h | 14 ++------------
> >2 files changed, 4 insertions(+), 24 deletions(-)
> 
> I think we can actually get rid of these files altogether.

And after double checking sparc is using the generic implementation
without any arch specific tweaks as well. So it can be dropped.

David, does the following look good to you or am I missing something?
I will drop "[PATCH 08/11] sparc, rwsem: provide __down_write_killable"
then.
---
>From 8a71514d2973f9e5c0330a23e0409d195e796b4c Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 6 Apr 2016 10:33:37 +0200
Subject: [PATCH] sparc, rwsem: drop superfluous arch specific implementation

sparc basically reuses generic implementation of rwsem so we can
reuse the code rather than duplicate it.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/sparc/include/asm/Kbuild  |   1 +
 arch/sparc/include/asm/rwsem.h | 119 -----------------------------------------
 2 files changed, 1 insertion(+), 119 deletions(-)
 delete mode 100644 arch/sparc/include/asm/rwsem.h

diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index e928618838bc..6024c26c0585 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += mm-arch-hooks.h
 generic-y += module.h
 generic-y += mutex.h
 generic-y += preempt.h
+generic-y += rwsem.h
 generic-y += serial.h
 generic-y += trace_clock.h
 generic-y += types.h
diff --git a/arch/sparc/include/asm/rwsem.h b/arch/sparc/include/asm/rwsem.h
deleted file mode 100644
index e5a0d575bc7f..000000000000
--- a/arch/sparc/include/asm/rwsem.h
+++ /dev/null
@@ -1,119 +0,0 @@
-/*
- * rwsem.h: R/W semaphores implemented using CAS
- *
- * Written by David S. Miller (davem@redhat.com), 2001.
- * Derived from asm-i386/rwsem.h
- */
-#ifndef _SPARC64_RWSEM_H
-#define _SPARC64_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-
-#define RWSEM_UNLOCKED_VALUE		0x00000000L
-#define RWSEM_ACTIVE_BIAS		0x00000001L
-#define RWSEM_ACTIVE_MASK		0xffffffffL
-#define RWSEM_WAITING_BIAS		(-RWSEM_ACTIVE_MASK-1)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	if (unlikely(atomic64_inc_return((atomic64_t *)(&sem->count)) <= 0L))
-		rwsem_down_read_failed(sem);
-}
-
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	while ((tmp = sem->count) >= 0L) {
-		if (tmp == cmpxchg(&sem->count, tmp,
-				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
-			return 1;
-		}
-	}
-	return 0;
-}
-
-/*
- * lock for writing
- */
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic64_add_return(RWSEM_ACTIVE_WRITE_BIAS,
-				  (atomic64_t *)(&sem->count));
-	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
-		rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
-		      RWSEM_ACTIVE_WRITE_BIAS);
-	return tmp == RWSEM_UNLOCKED_VALUE;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic64_dec_return((atomic64_t *)(&sem->count));
-	if (unlikely(tmp < -1L && (tmp & RWSEM_ACTIVE_MASK) == 0L))
-		rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	if (unlikely(atomic64_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
-					 (atomic64_t *)(&sem->count)) < 0L))
-		rwsem_wake(sem);
-}
-
-/*
- * implement atomic add functionality
- */
-static inline void rwsem_atomic_add(long delta, struct rw_semaphore *sem)
-{
-	atomic64_add(delta, (atomic64_t *)(&sem->count));
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic64_add_return(-RWSEM_WAITING_BIAS, (atomic64_t *)(&sem->count));
-	if (tmp < 0L)
-		rwsem_downgrade_wake(sem);
-}
-
-/*
- * implement exchange and add functionality
- */
-static inline long rwsem_atomic_update(long delta, struct rw_semaphore *sem)
-{
-	return atomic64_add_return(delta, (atomic64_t *)(&sem->count));
-}
-
-#endif /* __KERNEL__ */
-
-#endif /* _SPARC64_RWSEM_H */
-- 
2.8.0.rc3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/2] sh, rwsem: drop superfluous arch specific implementation
  2016-04-04  9:06         ` [PATCH 2/2] sh, " Michal Hocko
@ 2016-04-06  9:26           ` Peter Zijlstra
  2016-04-06  9:50             ` Geert Uytterhoeven
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2016-04-06  9:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko, ysato,
	dalias


+Cc maintainers, linux-sh list is very high signal-to-noise as its been
appropriated for renesas arm parts.

On Mon, Apr 04, 2016 at 11:06:40AM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> since "locking, rwsem: drop explicit memory barriers" the arch specific
> code is basically same as the the generic one so we can drop the
> superfluous code.
> 
> Suggested-by: Davidlohr Bueso <dave@stgolabs.net>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/sh/include/asm/Kbuild  |   1 +
>  arch/sh/include/asm/rwsem.h | 130 --------------------------------------------
>  2 files changed, 1 insertion(+), 130 deletions(-)
>  delete mode 100644 arch/sh/include/asm/rwsem.h
> 
> diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
> index aac452b26aa8..6d866c7a9b74 100644
> --- a/arch/sh/include/asm/Kbuild
> +++ b/arch/sh/include/asm/Kbuild
> @@ -25,6 +25,7 @@ generic-y += percpu.h
>  generic-y += poll.h
>  generic-y += preempt.h
>  generic-y += resource.h
> +generic-y += rwsem.h
>  generic-y += sembuf.h
>  generic-y += serial.h
>  generic-y += shmbuf.h
> diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
> deleted file mode 100644
> index 8a457b83d2a5..000000000000
> --- a/arch/sh/include/asm/rwsem.h
> +++ /dev/null
> @@ -1,130 +0,0 @@
> -/*
> - * include/asm-sh/rwsem.h: R/W semaphores for SH using the stuff
> - * in lib/rwsem.c.
> - */
> -
> -#ifndef _ASM_SH_RWSEM_H
> -#define _ASM_SH_RWSEM_H
> -
> -#ifndef _LINUX_RWSEM_H
> -#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
> -#endif
> -
> -#ifdef __KERNEL__
> -
> -#define RWSEM_UNLOCKED_VALUE		0x00000000
> -#define RWSEM_ACTIVE_BIAS		0x00000001
> -#define RWSEM_ACTIVE_MASK		0x0000ffff
> -#define RWSEM_WAITING_BIAS		(-0x00010000)
> -#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
> -#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
> -
> -/*
> - * lock for reading
> - */
> -static inline void __down_read(struct rw_semaphore *sem)
> -{
> -	if (atomic_inc_return((atomic_t *)(&sem->count)) <= 0)
> -		rwsem_down_read_failed(sem);
> -}
> -
> -static inline int __down_read_trylock(struct rw_semaphore *sem)
> -{
> -	int tmp;
> -
> -	while ((tmp = sem->count) >= 0) {
> -		if (tmp == cmpxchg(&sem->count, tmp,
> -				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
> -			return 1;
> -		}
> -	}
> -	return 0;
> -}
> -
> -/*
> - * lock for writing
> - */
> -static inline void __down_write(struct rw_semaphore *sem)
> -{
> -	int tmp;
> -
> -	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
> -				(atomic_t *)(&sem->count));
> -	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
> -		rwsem_down_write_failed(sem);
> -}
> -
> -static inline int __down_write_killable(struct rw_semaphore *sem)
> -{
> -	int tmp;
> -
> -	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
> -				(atomic_t *)(&sem->count));
> -	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
> -		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
> -			return -EINTR;
> -
> -	return 0;
> -}
> -
> -static inline int __down_write_trylock(struct rw_semaphore *sem)
> -{
> -	int tmp;
> -
> -	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
> -		      RWSEM_ACTIVE_WRITE_BIAS);
> -	return tmp == RWSEM_UNLOCKED_VALUE;
> -}
> -
> -/*
> - * unlock after reading
> - */
> -static inline void __up_read(struct rw_semaphore *sem)
> -{
> -	int tmp;
> -
> -	tmp = atomic_dec_return((atomic_t *)(&sem->count));
> -	if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)
> -		rwsem_wake(sem);
> -}
> -
> -/*
> - * unlock after writing
> - */
> -static inline void __up_write(struct rw_semaphore *sem)
> -{
> -	if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
> -			      (atomic_t *)(&sem->count)) < 0)
> -		rwsem_wake(sem);
> -}
> -
> -/*
> - * implement atomic add functionality
> - */
> -static inline void rwsem_atomic_add(int delta, struct rw_semaphore *sem)
> -{
> -	atomic_add(delta, (atomic_t *)(&sem->count));
> -}
> -
> -/*
> - * downgrade write lock to read lock
> - */
> -static inline void __downgrade_write(struct rw_semaphore *sem)
> -{
> -	int tmp;
> -
> -	tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
> -	if (tmp < 0)
> -		rwsem_downgrade_wake(sem);
> -}
> -
> -/*
> - * implement exchange and add functionality
> - */
> -static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
> -{
> -	return atomic_add_return(delta, (atomic_t *)(&sem->count));
> -}
> -
> -#endif /* __KERNEL__ */
> -#endif /* _ASM_SH_RWSEM_H */
> -- 
> 2.8.0.rc3
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/2] sh, rwsem: drop superfluous arch specific implementation
  2016-04-06  9:26           ` Peter Zijlstra
@ 2016-04-06  9:50             ` Geert Uytterhoeven
  2016-04-06 10:27               ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Geert Uytterhoeven @ 2016-04-06  9:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michal Hocko, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, the arch/x86 maintainers, alpha, linux-ia64,
	linux-s390, Linux-sh list, sparclinux, linux-xtensa, Linux-Arch,
	Michal Hocko, Yoshinori Sato, Rich Felker

Dag Peter,

On Wed, Apr 6, 2016 at 11:26 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> +Cc maintainers, linux-sh list is very high signal-to-noise as its been
> appropriated for renesas arm parts.

FTR, Renesas ARM parts have moved to linux-renesas-soc.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/2] sh, rwsem: drop superfluous arch specific implementation
  2016-04-06  9:50             ` Geert Uytterhoeven
@ 2016-04-06 10:27               ` Peter Zijlstra
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2016-04-06 10:27 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Michal Hocko, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, the arch/x86 maintainers, alpha, linux-ia64,
	linux-s390, Linux-sh list, sparclinux, linux-xtensa, Linux-Arch,
	Michal Hocko, Yoshinori Sato, Rich Felker

On Wed, Apr 06, 2016 at 11:50:00AM +0200, Geert Uytterhoeven wrote:
> Dag Peter,
> 
> On Wed, Apr 6, 2016 at 11:26 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > +Cc maintainers, linux-sh list is very high signal-to-noise as its been
> > appropriated for renesas arm parts.
> 
> FTR, Renesas ARM parts have moved to linux-renesas-soc.

Ah, good to know that situation has been sorted.

Thanks!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 10/11] x86, rwsem: provide __down_write_killable
  2016-04-01 11:04 ` [PATCH 10/11] x86, " Michal Hocko
@ 2016-04-06 18:31   ` Peter Zijlstra
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2016-04-06 18:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

On Fri, Apr 01, 2016 at 01:04:54PM +0200, Michal Hocko wrote:
> diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
> index 40027db99140..d1a1397e1fb3 100644
> --- a/arch/x86/lib/rwsem.S
> +++ b/arch/x86/lib/rwsem.S
> @@ -101,6 +101,14 @@ ENTRY(call_rwsem_down_write_failed)
>  	ret
>  ENDPROC(call_rwsem_down_write_failed)
>  
> +ENTRY(call_rwsem_down_write_failed_killable)
> +	save_common_regs
> +	movq %rax,%rdi
> +	call rwsem_down_write_failed_killable
> +	restore_common_regs
> +	ret
> +ENDPROC(call_rwsem_down_write_failed_killable)
> +

I had a conflict here and made that:

--- a/arch/x86/lib/rwsem.S
+++ b/arch/x86/lib/rwsem.S
@@ -106,6 +106,16 @@ ENTRY(call_rwsem_down_write_failed)
 	ret
 ENDPROC(call_rwsem_down_write_failed)
 
+ENTRY(call_rwsem_down_write_failed_killable)
+	FRAME_BEGIN
+	save_common_regs
+	movq %rax,%rdi
+	call rwsem_down_write_failed_killable
+	restore_common_regs
+	FRAME_END
+	ret
+ENDPROC(call_rwsem_down_write_failed_killable)
+
 ENTRY(call_rwsem_wake)
 	FRAME_BEGIN
 	/* do nothing if still outstanding active readers */

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-04-04  9:17     ` Michal Hocko
  2016-04-04  9:21       ` Peter Zijlstra
@ 2016-04-07  6:58       ` Davidlohr Bueso
  2016-04-07  7:38         ` Michal Hocko
  1 sibling, 1 reply; 51+ messages in thread
From: Davidlohr Bueso @ 2016-04-07  6:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch

On Mon, 04 Apr 2016, Michal Hocko wrote:

>Not sure I got your point here.

You set current to TASK_KILLABLE in the sleep loop, why do you want to change
it here to TASK_RUNNING if its about to be killed? At least in the case of
UNINTERRUPTABLE we do it merely as a redundancy after the breaking out of the
loop. Of course we also acquired the lock in the first place by that time and
we _better_ be running.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-04-07  6:58       ` Davidlohr Bueso
@ 2016-04-07  7:38         ` Michal Hocko
  0 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2016-04-07  7:38 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch

On Wed 06-04-16 23:58:07, Davidlohr Bueso wrote:
> On Mon, 04 Apr 2016, Michal Hocko wrote:
> 
> >Not sure I got your point here.
> 
> You set current to TASK_KILLABLE in the sleep loop, why do you want to change
> it here to TASK_RUNNING if its about to be killed?

Wouldn't it be unexpected to return from a lock with something else than
TASK_RUNNING?

> At least in the case of
> UNINTERRUPTABLE we do it merely as a redundancy after the breaking out of the
> loop. Of course we also acquired the lock in the first place by that time and
> we _better_ be running.

I guess the reason was that rwsem_try_write_lock might suceed and we do
not want to return with TASK_UNINTERRUPTIBLE in that case.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-04-01 11:04 ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Michal Hocko
  2016-04-02  4:41   ` Davidlohr Bueso
@ 2016-05-10 10:43   ` Tetsuo Handa
  2016-05-10 11:53     ` Michal Hocko
  1 sibling, 1 reply; 51+ messages in thread
From: Tetsuo Handa @ 2016-05-10 10:43 UTC (permalink / raw)
  To: Michal Hocko, LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Michal Hocko

I hit "allowing the OOM killer to select the same thread again" problem
( http://lkml.kernel.org/r/20160408113425.GF29820@dhcp22.suse.cz ), but
I think that there is a bug in down_write_killable() series (at least
"locking, rwsem: introduce basis for down_write_killable" patch).

Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160510-sem.txt.xz .
----------
[   48.303867] Out of memory: Kill process 1314 (tgid=1314) score 1000 or sacrifice child
[   48.308582] Killed process 1314 (tgid=1314) total-vm:70844kB, anon-rss:1980kB, file-rss:0kB, shmem-rss:0kB
[   49.323719] oom_reaper: unable to reap pid:1314 (tgid=1314)
[   49.338146]
[   49.338146] Showing all locks held in the system:
(...snipped...)
[   49.801071] 1 lock held by tgid=1314/1314:
[   49.803953]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff810fddac>] acct_collect+0x5c/0x1e0
[   49.809200] 1 lock held by tgid=1314/1443:
[   49.812102]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff81073a45>] do_exit+0x175/0xb10
(...snipped...)
[   51.188928] oom_reaper: unable to reap pid:1443 (tgid=1314)
[   55.576750] oom_reaper: unable to reap pid:1314 (tgid=1314)
[   57.717917] oom_reaper: unable to reap pid:1314 (tgid=1314)
[   59.285880] oom_reaper: unable to reap pid:1314 (tgid=1314)
[   60.818697] oom_reaper: unable to reap pid:1314 (tgid=1314)
(...snipped...)
[  174.429572] tgid=1314       D ffff88003ad93b90     0  1314   1209 0x00100084
[  174.429573]  ffff88003ad93b90 ffff88003ad8f6b8 ffff880039c3a140 ffff88003ad8c080
[  174.429574]  ffff88003ad94000 ffff88003ad8f6a0 ffff88003ad8f6b8 0000000000000000
[  174.429575]  0000000000000008 ffff88003ad93ba8 ffffffff81616190 ffff88003ad8c080
[  174.429585] Call Trace:
[  174.429586]  [<ffffffff81616190>] schedule+0x30/0x80
[  174.429587]  [<ffffffff81619e26>] rwsem_down_read_failed+0xd6/0x140
[  174.429589]  [<ffffffff812dd6f8>] call_rwsem_down_read_failed+0x18/0x30
[  174.429590]  [<ffffffff816196dd>] down_read+0x3d/0x50
[  174.429592]  [<ffffffff810fddac>] ? acct_collect+0x5c/0x1e0
[  174.429593]  [<ffffffff810fddac>] acct_collect+0x5c/0x1e0
[  174.429594]  [<ffffffff81073ff5>] do_exit+0x725/0xb10
[  174.429594]  [<ffffffff81074467>] do_group_exit+0x47/0xc0
[  174.429596]  [<ffffffff8108075f>] get_signal+0x20f/0x7b0
[  174.429597]  [<ffffffff81024fb2>] do_signal+0x32/0x700
[  174.429598]  [<ffffffff810bdc69>] ? trace_hardirqs_on+0x9/0x10
[  174.429599]  [<ffffffff810c3552>] ? rwsem_wake+0x72/0xe0
[  174.429600]  [<ffffffff812dd78b>] ? call_rwsem_wake+0x1b/0x30
[  174.429601]  [<ffffffff810b9ee0>] ? up_read+0x30/0x40
[  174.429602]  [<ffffffff8106b495>] ? exit_to_usermode_loop+0x29/0x9e
[  174.429603]  [<ffffffff8106b4bf>] exit_to_usermode_loop+0x53/0x9e
[  174.429604]  [<ffffffff8100348d>] prepare_exit_to_usermode+0x7d/0x90
[  174.429605]  [<ffffffff8161bd3e>] retint_user+0x8/0x23
[  174.429605] tgid=1314       D ffff88003aa2fbd0     0  1443   1209 0x00000084
[  174.429607]  ffff88003aa2fbd0 ffff88003ad8f6b8 ffff8800382060c0 ffff88003aa2a140
[  174.429608]  ffff88003aa30000 ffff88003ad8f6a0 ffff88003ad8f6b8 ffff88003aa2a140
[  174.429609]  0000000000000008 ffff88003aa2fbe8 ffffffff81616190 ffff88003aa2a140
[  174.429610] Call Trace:
[  174.429611]  [<ffffffff81616190>] schedule+0x30/0x80
[  174.429612]  [<ffffffff81619e26>] rwsem_down_read_failed+0xd6/0x140
[  174.429613]  [<ffffffff810bdb99>] ? trace_hardirqs_on_caller+0xf9/0x1c0
[  174.429614]  [<ffffffff812dd6f8>] call_rwsem_down_read_failed+0x18/0x30
[  174.429615]  [<ffffffff816196dd>] down_read+0x3d/0x50
[  174.429616]  [<ffffffff81073a45>] ? do_exit+0x175/0xb10
[  174.429616]  [<ffffffff81073a45>] do_exit+0x175/0xb10
[  174.429617]  [<ffffffff81074467>] do_group_exit+0x47/0xc0
[  174.429618]  [<ffffffff8108075f>] get_signal+0x20f/0x7b0
[  174.429619]  [<ffffffff81024fb2>] do_signal+0x32/0x700
[  174.429620]  [<ffffffff8161acae>] ? _raw_spin_unlock_irq+0x2e/0x40
[  174.429621]  [<ffffffff8161a2bf>] ? rwsem_down_write_failed_killable+0x1ef/0x280
[  174.429631]  [<ffffffff8106b555>] ? syscall_slow_exit_work+0x4b/0x10d
[  174.429632]  [<ffffffff8106b495>] ? exit_to_usermode_loop+0x29/0x9e
[  174.429633]  [<ffffffff8106b4bf>] exit_to_usermode_loop+0x53/0x9e
[  174.429634]  [<ffffffff81003715>] do_syscall_64+0x135/0x1b0
[  174.429635]  [<ffffffff8161b43f>] entry_SYSCALL64_slow_path+0x25/0x25
(...snipped...)
[  217.651477] oom_reaper: unable to reap pid:1314 (tgid=1314)
[  219.071975] oom_reaper: unable to reap pid:1314 (tgid=1314)
[  220.508961] oom_reaper: unable to reap pid:1314 (tgid=1314)
[  222.022111] oom_reaper: unable to reap pid:1314 (tgid=1314)
[  223.560166] oom_reaper: unable to reap pid:1314 (tgid=1314)
[  225.267750] oom_reaper: unable to reap pid:1314 (tgid=1314)
----------

2 threads (PID: 1314 and 1443) are sleeping at rwsem_down_read_failed()
but no thread is sleeping at rwsem_down_write_failed_killable().
If there is no thread waiting for write lock, threads waiting for read
lock must be able to run. This suggests that one of threads which was
waiting for write lock forgot to wake up reader threads.

Looking at rwsem_down_read_failed(), reader threads waiting for the
writer thread to release the lock are waiting on sem->wait_list list.
Looking at __rwsem_down_write_failed_common(), when the writer thread
escaped the

                 /* Block until there are no active lockers. */
                 do {
                         if (signal_pending_state(state, current)) {
                                 raw_spin_lock_irq(&sem->wait_lock);
                                 ret = ERR_PTR(-EINTR);
                                 goto out;
                         }
                         schedule();
                         set_current_state(state);
                 } while ((count = sem->count) & RWSEM_ACTIVE_MASK);

loop due to SIGKILL, I think that the writer thread needs to check for
remaining threads on sem->wait_list list and wake up reader threads
before rwsem_down_write_failed_killable() returns -EINTR.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-10 10:43   ` Tetsuo Handa
@ 2016-05-10 11:53     ` Michal Hocko
  2016-05-10 12:38       ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Michal Hocko @ 2016-05-10 11:53 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov

On Tue 10-05-16 19:43:20, Tetsuo Handa wrote:
> I hit "allowing the OOM killer to select the same thread again" problem
> ( http://lkml.kernel.org/r/20160408113425.GF29820@dhcp22.suse.cz ), but
> I think that there is a bug in down_write_killable() series (at least
> "locking, rwsem: introduce basis for down_write_killable" patch).
> 
> Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160510-sem.txt.xz .
[...]
> 2 threads (PID: 1314 and 1443) are sleeping at rwsem_down_read_failed()
> but no thread is sleeping at rwsem_down_write_failed_killable().
> If there is no thread waiting for write lock, threads waiting for read
> lock must be able to run. This suggests that one of threads which was
> waiting for write lock forgot to wake up reader threads.

Or that the write lock holder is still keeping the lock held. I do not
see such a process in your list though. Is it possible that the
debug_show_all_locks would just miss it as it is not sleeping?

> Looking at rwsem_down_read_failed(), reader threads waiting for the
> writer thread to release the lock are waiting on sem->wait_list list.
> Looking at __rwsem_down_write_failed_common(), when the writer thread
> escaped the
> 
>                  /* Block until there are no active lockers. */
>                  do {
>                          if (signal_pending_state(state, current)) {
>                                  raw_spin_lock_irq(&sem->wait_lock);
>                                  ret = ERR_PTR(-EINTR);
>                                  goto out;
>                          }
>                          schedule();
>                          set_current_state(state);
>                  } while ((count = sem->count) & RWSEM_ACTIVE_MASK);
> 
> loop due to SIGKILL, I think that the writer thread needs to check for
> remaining threads on sem->wait_list list and wake up reader threads
> before rwsem_down_write_failed_killable() returns -EINTR.

I am not sure I understand. The rwsem counter is not write locked while
the thread is sleeping and when we fail on the signal pending so readers
should be able to proceed, no?

Or are you suggesting that the failure path should call rwsem_wake? I
do not see __mutex_lock_common for killable wait doing something like
that and rwsem_wake is explicitly documented that it is called after the
lock state has been updated already. Now I might be missing something
subtle here but I guess the code is correct and it is more likely that
the holder of the lock wasn't killed but it is rather holding the lock
and doing something else.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-10 11:53     ` Michal Hocko
@ 2016-05-10 12:38       ` Peter Zijlstra
  2016-05-10 13:57         ` Tetsuo Handa
  2016-05-11  7:23         ` Michal Hocko
  0 siblings, 2 replies; 51+ messages in thread
From: Peter Zijlstra @ 2016-05-10 12:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Tue, May 10, 2016 at 01:53:20PM +0200, Michal Hocko wrote:
> On Tue 10-05-16 19:43:20, Tetsuo Handa wrote:
> > I hit "allowing the OOM killer to select the same thread again" problem
> > ( http://lkml.kernel.org/r/20160408113425.GF29820@dhcp22.suse.cz ), but
> > I think that there is a bug in down_write_killable() series (at least
> > "locking, rwsem: introduce basis for down_write_killable" patch).
> > 
> > Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160510-sem.txt.xz .
> [...]
> > 2 threads (PID: 1314 and 1443) are sleeping at rwsem_down_read_failed()
> > but no thread is sleeping at rwsem_down_write_failed_killable().
> > If there is no thread waiting for write lock, threads waiting for read
> > lock must be able to run. This suggests that one of threads which was
> > waiting for write lock forgot to wake up reader threads.
> 
> Or that the write lock holder is still keeping the lock held. I do not
> see such a process in your list though. Is it possible that the
> debug_show_all_locks would just miss it as it is not sleeping?
> 
> > Looking at rwsem_down_read_failed(), reader threads waiting for the
> > writer thread to release the lock are waiting on sem->wait_list list.
> > Looking at __rwsem_down_write_failed_common(), when the writer thread
> > escaped the
> > 
> >                  /* Block until there are no active lockers. */
> >                  do {
> >                          if (signal_pending_state(state, current)) {
> >                                  raw_spin_lock_irq(&sem->wait_lock);
> >                                  ret = ERR_PTR(-EINTR);
> >                                  goto out;
> >                          }
> >                          schedule();
> >                          set_current_state(state);
> >                  } while ((count = sem->count) & RWSEM_ACTIVE_MASK);
> > 
> > loop due to SIGKILL, I think that the writer thread needs to check for
> > remaining threads on sem->wait_list list and wake up reader threads
> > before rwsem_down_write_failed_killable() returns -EINTR.
> 
> I am not sure I understand. The rwsem counter is not write locked while
> the thread is sleeping and when we fail on the signal pending so readers
> should be able to proceed, no?
> 
> Or are you suggesting that the failure path should call rwsem_wake? I
> do not see __mutex_lock_common for killable wait doing something like
> that and rwsem_wake is explicitly documented that it is called after the
> lock state has been updated already. Now I might be missing something
> subtle here but I guess the code is correct and it is more likely that
> the holder of the lock wasn't killed but it is rather holding the lock
> and doing something else.

Mutex is much simpler; it doesn't have to do the reader-vs-writer
fairness thing.

However, at the time I was thinking that if we have:

	reader (owner)
	writer (pending)
	reader (blocked on writer)

and writer would get cancelled, the up_read() would do a wakeup and kick
the blocked reader.

But yes, immediately kicking further pending waiters might be better.

Also, looking at it again; I think we're forgetting to re-adjust the
BIAS for the cancelled writer.

Davidlohr, Waiman, can you look at this?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-10 12:38       ` Peter Zijlstra
@ 2016-05-10 13:57         ` Tetsuo Handa
  2016-05-11  7:23         ` Michal Hocko
  1 sibling, 0 replies; 51+ messages in thread
From: Tetsuo Handa @ 2016-05-10 13:57 UTC (permalink / raw)
  To: peterz, mhocko
  Cc: linux-kernel, mingo, tglx, hpa, davem, tony.luck, akpm, chris,
	jcmvbkbc, dave, Waiman.Long

Michal Hocko wrote:
> On Tue 10-05-16 19:43:20, Tetsuo Handa wrote:
> > I hit "allowing the OOM killer to select the same thread again" problem
> > ( http://lkml.kernel.org/r/20160408113425.GF29820@dhcp22.suse.cz ), but
> > I think that there is a bug in down_write_killable() series (at least
> > "locking, rwsem: introduce basis for down_write_killable" patch).
> > 
> > Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160510-sem.txt.xz .
> [...]
> > 2 threads (PID: 1314 and 1443) are sleeping at rwsem_down_read_failed()
> > but no thread is sleeping at rwsem_down_write_failed_killable().
> > If there is no thread waiting for write lock, threads waiting for read
> > lock must be able to run. This suggests that one of threads which was
> > waiting for write lock forgot to wake up reader threads.
> 
> Or that the write lock holder is still keeping the lock held. I do not
> see such a process in your list though. Is it possible that the
> debug_show_all_locks would just miss it as it is not sleeping?

I don't think it is possible. This reproducer
( http://lkml.kernel.org/r/201605061958.HHG48967.JVFtSLFQOFOOMH@I-love.SAKURA.ne.jp )
creates a thread group with two threads, and two of these two threads are
sleeping at rwsem_down_read_failed() waiting for mmap_sem.
SysRq-t suggests that PID 1443 called rwsem_down_write_failed_killable()
before calling rwsem_down_read_failed().

By the way, I suggested you to show traces of threads which are using the OOM victim's mm
( http://lkml.kernel.org/r/201603172200.CIE52148.QOVSOHJFMLOFtF@I-love.SAKURA.ne.jp ),
but you said that showing all locks held by !TASK_RUNNING threads would be useful
( http://lkml.kernel.org/r/20160323120716.GE7059@dhcp22.suse.cz ).
Do you admit that debug_show_all_locks() is not always useful by suspecting
the possibility of debug_show_all_locks() failing to report a thread which
held mmap_sem for write? (This is a kmallocwd topic, so I stop here.)

> 
> > Looking at rwsem_down_read_failed(), reader threads waiting for the
> > writer thread to release the lock are waiting on sem->wait_list list.
> > Looking at __rwsem_down_write_failed_common(), when the writer thread
> > escaped the
> > 
> >                  /* Block until there are no active lockers. */
> >                  do {
> >                          if (signal_pending_state(state, current)) {
> >                                  raw_spin_lock_irq(&sem->wait_lock);
> >                                  ret = ERR_PTR(-EINTR);
> >                                  goto out;
> >                          }
> >                          schedule();
> >                          set_current_state(state);
> >                  } while ((count = sem->count) & RWSEM_ACTIVE_MASK);
> > 
> > loop due to SIGKILL, I think that the writer thread needs to check for
> > remaining threads on sem->wait_list list and wake up reader threads
> > before rwsem_down_write_failed_killable() returns -EINTR.
> 
> I am not sure I understand. The rwsem counter is not write locked while
> the thread is sleeping and when we fail on the signal pending so readers
> should be able to proceed, no?
> 
I guess __rwsem_do_wake() is needed for waking up the readers because
I guess the sequence occurred was

  (1) PID 1314 requested down_read() and succeeded.
  (2) PID 1443 requested down_write_killable() and blocked.
  (3) The OOM killer sent SIGKILL to PID 1314 and PID 1443.
  (4) PID 1443 left down_write_killable() with -EINTR.
  (5) PID 1314 called up_read() and down_read()
      while PID 1443 called down_read().

.

> Or are you suggesting that the failure path should call rwsem_wake?

I don't know how rwsem works. Please consult maintainers.

Peter Zijlstra wrote:
> Mutex is much simpler; it doesn't have to do the reader-vs-writer
> fairness thing.
> 
> However, at the time I was thinking that if we have:
> 
> 	reader (owner)
> 	writer (pending)
> 	reader (blocked on writer)
> 
> and writer would get cancelled, the up_read() would do a wakeup and kick
> the blocked reader.
> 
> But yes, immediately kicking further pending waiters might be better.
> 
> Also, looking at it again; I think we're forgetting to re-adjust the
> BIAS for the cancelled writer.

Yes, I think so.
> 
> Davidlohr, Waiman, can you look at this?
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-10 12:38       ` Peter Zijlstra
  2016-05-10 13:57         ` Tetsuo Handa
@ 2016-05-11  7:23         ` Michal Hocko
  2016-05-11  8:28           ` Michal Hocko
  2016-05-11  8:35           ` Peter Zijlstra
  1 sibling, 2 replies; 51+ messages in thread
From: Michal Hocko @ 2016-05-11  7:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Tue 10-05-16 14:38:06, Peter Zijlstra wrote:
> On Tue, May 10, 2016 at 01:53:20PM +0200, Michal Hocko wrote:
> > On Tue 10-05-16 19:43:20, Tetsuo Handa wrote:
> > > I hit "allowing the OOM killer to select the same thread again" problem
> > > ( http://lkml.kernel.org/r/20160408113425.GF29820@dhcp22.suse.cz ), but
> > > I think that there is a bug in down_write_killable() series (at least
> > > "locking, rwsem: introduce basis for down_write_killable" patch).
> > > 
> > > Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160510-sem.txt.xz .
> > [...]
> > > 2 threads (PID: 1314 and 1443) are sleeping at rwsem_down_read_failed()
> > > but no thread is sleeping at rwsem_down_write_failed_killable().
> > > If there is no thread waiting for write lock, threads waiting for read
> > > lock must be able to run. This suggests that one of threads which was
> > > waiting for write lock forgot to wake up reader threads.
> > 
> > Or that the write lock holder is still keeping the lock held. I do not
> > see such a process in your list though. Is it possible that the
> > debug_show_all_locks would just miss it as it is not sleeping?
> > 
> > > Looking at rwsem_down_read_failed(), reader threads waiting for the
> > > writer thread to release the lock are waiting on sem->wait_list list.
> > > Looking at __rwsem_down_write_failed_common(), when the writer thread
> > > escaped the
> > > 
> > >                  /* Block until there are no active lockers. */
> > >                  do {
> > >                          if (signal_pending_state(state, current)) {
> > >                                  raw_spin_lock_irq(&sem->wait_lock);
> > >                                  ret = ERR_PTR(-EINTR);
> > >                                  goto out;
> > >                          }
> > >                          schedule();
> > >                          set_current_state(state);
> > >                  } while ((count = sem->count) & RWSEM_ACTIVE_MASK);
> > > 
> > > loop due to SIGKILL, I think that the writer thread needs to check for
> > > remaining threads on sem->wait_list list and wake up reader threads
> > > before rwsem_down_write_failed_killable() returns -EINTR.
> > 
> > I am not sure I understand. The rwsem counter is not write locked while
> > the thread is sleeping and when we fail on the signal pending so readers
> > should be able to proceed, no?
> > 
> > Or are you suggesting that the failure path should call rwsem_wake? I
> > do not see __mutex_lock_common for killable wait doing something like
> > that and rwsem_wake is explicitly documented that it is called after the
> > lock state has been updated already. Now I might be missing something
> > subtle here but I guess the code is correct and it is more likely that
> > the holder of the lock wasn't killed but it is rather holding the lock
> > and doing something else.
> 
> Mutex is much simpler; it doesn't have to do the reader-vs-writer
> fairness thing.
> 
> However, at the time I was thinking that if we have:
> 
> 	reader (owner)
> 	writer (pending)
> 	reader (blocked on writer)
> 
> and writer would get cancelled, the up_read() would do a wakeup and kick
> the blocked reader.
> 
> But yes, immediately kicking further pending waiters might be better.

OK, that makes sense. We shouldn't be waiting for the first reader to
do up_read.

> Also, looking at it again; I think we're forgetting to re-adjust the
> BIAS for the cancelled writer.

Hmm, __rwsem_down_write_failed_common does

	/* undo write bias from down_write operation, stop active locking */
	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);

which should remove the bias AFAIU. Later we do

	if (waiting) {
		count = READ_ONCE(sem->count);

		/*
		 * If there were already threads queued before us and there are
		 * no active writers, the lock must be read owned; so we try to
		 * wake any read locks that were queued ahead of us.
		 */
		if (count > RWSEM_WAITING_BIAS)
			sem = __rwsem_do_wake(sem, RWSEM_WAKE_READERS);

	} else
		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);

and that might set RWSEM_WAITING_BIAS but the current holder of the lock
should handle that correctly and wake the waiting tasks IIUC. I will go
and check the code closer. It is quite easy to get this subtle code
wrong...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  7:23         ` Michal Hocko
@ 2016-05-11  8:28           ` Michal Hocko
  2016-05-11  8:44             ` Peter Zijlstra
  2016-05-11  8:35           ` Peter Zijlstra
  1 sibling, 1 reply; 51+ messages in thread
From: Michal Hocko @ 2016-05-11  8:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed 11-05-16 09:23:57, Michal Hocko wrote:
> On Tue 10-05-16 14:38:06, Peter Zijlstra wrote:
[...]
> > However, at the time I was thinking that if we have:
> > 
> > 	reader (owner)
> > 	writer (pending)
> > 	reader (blocked on writer)
> > 
> > and writer would get cancelled, the up_read() would do a wakeup and kick
> > the blocked reader.
> > 
> > But yes, immediately kicking further pending waiters might be better.
> 
> OK, that makes sense. We shouldn't be waiting for the first reader to
> do up_read.

I am still trying to wrap my head around the wake up logic (I managed to
forget all the juicy details). I still do not see the correctness
issue with the current code so the following should be "just an
optimization". But as I've said I might be missing something here.

Does the following look correct/reasonable? This is absolutely untested
and more for a discussion:
---
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index df4dcb883b50..726620c97366 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -516,7 +516,27 @@ EXPORT_SYMBOL(rwsem_down_write_failed);
 __visible struct rw_semaphore * __sched
 rwsem_down_write_failed_killable(struct rw_semaphore *sem)
 {
-	return __rwsem_down_write_failed_common(sem, TASK_KILLABLE);
+	struct rw_semaphore * ret;
+
+	ret = __rwsem_down_write_failed_common(sem, TASK_KILLABLE);
+	if (IS_ERR(ret)) {
+		long count;
+
+		raw_spin_lock_irq(&sem->wait_lock);
+		count = READ_ONCE(sem->count);
+
+		/*
+		 * If there were already threads queued before us and there are
+		 * no active writers, the lock must be read owned; so we try to
+		 * wake any read locks that were queued ahead of us so they
+		 * do not have to wait for up_read to wake up.
+		 */
+		if (count > RWSEM_WAITING_BIAS)
+			sem = __rwsem_do_wake(sem, RWSEM_WAKE_READERS);
+		raw_spin_unlock_irq(&sem->wait_lock);
+	}
+
+	return ret;
 }
 EXPORT_SYMBOL(rwsem_down_write_failed_killable);
 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  7:23         ` Michal Hocko
  2016-05-11  8:28           ` Michal Hocko
@ 2016-05-11  8:35           ` Peter Zijlstra
  2016-05-11  9:02             ` Michal Hocko
  1 sibling, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2016-05-11  8:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed, May 11, 2016 at 09:23:57AM +0200, Michal Hocko wrote:
> On Tue 10-05-16 14:38:06, Peter Zijlstra wrote:

> > Also, looking at it again; I think we're forgetting to re-adjust the
> > BIAS for the cancelled writer.
> 
> Hmm, __rwsem_down_write_failed_common does
> 
> 	/* undo write bias from down_write operation, stop active locking */
> 	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
> 
> which should remove the bias AFAIU.

Right; at this point we're neutral wrt bias.

> Later we do
> 
> 	if (waiting) {
> 		count = READ_ONCE(sem->count);
> 
> 		/*
> 		 * If there were already threads queued before us and there are
> 		 * no active writers, the lock must be read owned; so we try to
> 		 * wake any read locks that were queued ahead of us.
> 		 */
> 		if (count > RWSEM_WAITING_BIAS)
> 			sem = __rwsem_do_wake(sem, RWSEM_WAKE_READERS);
> 
> 	} else
> 		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
> 
> and that might set RWSEM_WAITING_BIAS but the current holder of the lock
> should handle that correctly and wake the waiting tasks IIUC. I will go
> and check the code closer. It is quite easy to get this subtle code
> wrong..

Subtle; yes.

So if you look at rwsem_try_write_lock() -- traditionally the only way
to exit this wait loop, you see it does:

	if (count == RWSEM_WAITING_BIAS &&
	    cmpxchg_acquire(&sem->count, RWSEM_WAITING_BIAS,
		    RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
		if (!list_is_singular(&sem->wait_list))
			rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
		rwsem_set_owner(sem);
		return true;
	}

Which ends up clearing RWSEM_WAITING_BIAS is we were the only waiter --
or rather, it always clear WAITING, but then tests the list and re-sets
it if there's more than one waiters on.

Now, the signal break doesn't clear WAITING if we were the only waiter
on the list; which means any further down_read() will block (I didn't
look at what a subsequent down_write() would do).

So I think we needs something like this, to clear WAITING if we leave
the list empty.

Does that make sense?

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index df4dcb883b50..7011dd1c286c 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -489,6 +489,8 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 		do {
 			if (signal_pending_state(state, current)) {
 				raw_spin_lock_irq(&sem->wait_lock);
+				if (list_singular(&sem->wait_list))
+					rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
 				ret = ERR_PTR(-EINTR);
 				goto out;
 			}

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  8:28           ` Michal Hocko
@ 2016-05-11  8:44             ` Peter Zijlstra
  2016-05-11  9:04               ` Michal Hocko
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2016-05-11  8:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed, May 11, 2016 at 10:28:53AM +0200, Michal Hocko wrote:

> Does the following look correct/reasonable? This is absolutely untested
> and more for a discussion:

I would much rather see it in common; something like so perhaps.

---
 kernel/locking/rwsem-xadd.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index df4dcb883b50..5d7f2831a475 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -487,11 +487,9 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 
 		/* Block until there are no active lockers. */
 		do {
-			if (signal_pending_state(state, current)) {
-				raw_spin_lock_irq(&sem->wait_lock);
-				ret = ERR_PTR(-EINTR);
-				goto out;
-			}
+			if (signal_pending_state(state, current))
+				goto out_nolock;
+
 			schedule();
 			set_current_state(state);
 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
@@ -504,6 +502,18 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 	raw_spin_unlock_irq(&sem->wait_lock);
 
 	return ret;
+
+out_nolock:
+	__set_current_state(TASK_RUNNING);
+	raw_spin_lock_irq(&sem->wait_lock);
+	list_del(&waiter.list);
+	if (list_empty(&sem->wait_list))
+		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
+	else
+		__rwsem_do_wake(sem, RWSEM_WAKE_READERS);
+	raw_spin_unlock_irq(&sem->wait_lock);
+
+	return ERR_PTR(-EINTR);
 }
 
 __visible struct rw_semaphore * __sched

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  8:35           ` Peter Zijlstra
@ 2016-05-11  9:02             ` Michal Hocko
  0 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2016-05-11  9:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed 11-05-16 10:35:12, Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 09:23:57AM +0200, Michal Hocko wrote:
> > On Tue 10-05-16 14:38:06, Peter Zijlstra wrote:
> 
> > > Also, looking at it again; I think we're forgetting to re-adjust the
> > > BIAS for the cancelled writer.
> > 
> > Hmm, __rwsem_down_write_failed_common does
> > 
> > 	/* undo write bias from down_write operation, stop active locking */
> > 	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
> > 
> > which should remove the bias AFAIU.
> 
> Right; at this point we're neutral wrt bias.
> 
> > Later we do
> > 
> > 	if (waiting) {
> > 		count = READ_ONCE(sem->count);
> > 
> > 		/*
> > 		 * If there were already threads queued before us and there are
> > 		 * no active writers, the lock must be read owned; so we try to
> > 		 * wake any read locks that were queued ahead of us.
> > 		 */
> > 		if (count > RWSEM_WAITING_BIAS)
> > 			sem = __rwsem_do_wake(sem, RWSEM_WAKE_READERS);
> > 
> > 	} else
> > 		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
> > 
> > and that might set RWSEM_WAITING_BIAS but the current holder of the lock
> > should handle that correctly and wake the waiting tasks IIUC. I will go
> > and check the code closer. It is quite easy to get this subtle code
> > wrong..
> 
> Subtle; yes.
> 
> So if you look at rwsem_try_write_lock() -- traditionally the only way
> to exit this wait loop, you see it does:
> 
> 	if (count == RWSEM_WAITING_BIAS &&
> 	    cmpxchg_acquire(&sem->count, RWSEM_WAITING_BIAS,
> 		    RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
> 		if (!list_is_singular(&sem->wait_list))
> 			rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
> 		rwsem_set_owner(sem);
> 		return true;
> 	}
> 
> Which ends up clearing RWSEM_WAITING_BIAS is we were the only waiter --
> or rather, it always clear WAITING, but then tests the list and re-sets
> it if there's more than one waiters on.
> 
> Now, the signal break doesn't clear WAITING if we were the only waiter
> on the list; which means any further down_read() will block (I didn't
> look at what a subsequent down_write() would do).

I was staring at this part as well but then I convinced myself that this
is OK because rwsem_down_read_failed does:

	if (list_empty(&sem->wait_list))
		adjustment += RWSEM_WAITING_BIAS;
	list_add_tail(&waiter.list, &sem->wait_list);

	/* we're now waiting on the lock, but no longer actively locking */
	count = rwsem_atomic_update(adjustment, sem);

	/* If there are no active locks, wake the front queued process(es).
	 *
	 * If there are no writers and we are first in the queue,
	 * wake our own waiter to join the existing active readers !
	 */
	if (count == RWSEM_WAITING_BIAS ||
	    (count > RWSEM_WAITING_BIAS &&
	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
		sem = __rwsem_do_wake(sem, RWSEM_WAKE_ANY);

__rwsem_do_wake should then see all the readers (including the current
one) and wake them up and set waiter->task to NULL to allow the current
one to break out of the loop as well.

> So I think we needs something like this, to clear WAITING if we leave
> the list empty.

But maybe this is the correct way to go.

> Does that make sense?

I do not see an immediate problem with this. Maybe Tetsuo can try this
out and see if it really makes a difference.

> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index df4dcb883b50..7011dd1c286c 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -489,6 +489,8 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
>  		do {
>  			if (signal_pending_state(state, current)) {
>  				raw_spin_lock_irq(&sem->wait_lock);
> +				if (list_singular(&sem->wait_list))
> +					rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
>  				ret = ERR_PTR(-EINTR);
>  				goto out;
>  			}

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  8:44             ` Peter Zijlstra
@ 2016-05-11  9:04               ` Michal Hocko
  2016-05-11  9:17                 ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Michal Hocko @ 2016-05-11  9:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed 11-05-16 10:44:01, Peter Zijlstra wrote:
[...]
> @@ -504,6 +502,18 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
>  	raw_spin_unlock_irq(&sem->wait_lock);
>  
>  	return ret;
> +
> +out_nolock:
> +	__set_current_state(TASK_RUNNING);
> +	raw_spin_lock_irq(&sem->wait_lock);
> +	list_del(&waiter.list);
> +	if (list_empty(&sem->wait_list))
> +		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
> +	else
> +		__rwsem_do_wake(sem, RWSEM_WAKE_READERS);
> +	raw_spin_unlock_irq(&sem->wait_lock);
> +
> +	return ERR_PTR(-EINTR);
>  }

Looks much better but don't we have to check the count for potentially
pending writers?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  9:04               ` Michal Hocko
@ 2016-05-11  9:17                 ` Peter Zijlstra
  2016-05-11  9:31                   ` Michal Hocko
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2016-05-11  9:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed, May 11, 2016 at 11:04:42AM +0200, Michal Hocko wrote:
> On Wed 11-05-16 10:44:01, Peter Zijlstra wrote:
> [...]
> > @@ -504,6 +502,18 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
> >  	raw_spin_unlock_irq(&sem->wait_lock);
> >  
> >  	return ret;
> > +
> > +out_nolock:
> > +	__set_current_state(TASK_RUNNING);
> > +	raw_spin_lock_irq(&sem->wait_lock);
> > +	list_del(&waiter.list);
> > +	if (list_empty(&sem->wait_list))
> > +		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
> > +	else
> > +		__rwsem_do_wake(sem, RWSEM_WAKE_READERS);
> > +	raw_spin_unlock_irq(&sem->wait_lock);
> > +
> > +	return ERR_PTR(-EINTR);
> >  }
> 
> Looks much better but don't we have to check the count for potentially
> pending writers?

Ah, so I was thinking that if we get here, there must still be an owner,
otherwise we'd have acquired the lock. And if there is an owner, we
cannot go wake writers. Hence the WAKE_READERS thing.

Then again, WAKE_ANY would not harm, in that if we do wake a pending
writer it will not proceed if it cannot and it'll go back to sleep
again.

So yeah, maybe WAKE_ANY is the prudent thing to do.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  9:17                 ` Peter Zijlstra
@ 2016-05-11  9:31                   ` Michal Hocko
  2016-05-11  9:41                     ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Michal Hocko @ 2016-05-11  9:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed 11-05-16 11:17:33, Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 11:04:42AM +0200, Michal Hocko wrote:
> > On Wed 11-05-16 10:44:01, Peter Zijlstra wrote:
> > [...]
> > > @@ -504,6 +502,18 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
> > >  	raw_spin_unlock_irq(&sem->wait_lock);
> > >  
> > >  	return ret;
> > > +
> > > +out_nolock:
> > > +	__set_current_state(TASK_RUNNING);
> > > +	raw_spin_lock_irq(&sem->wait_lock);
> > > +	list_del(&waiter.list);
> > > +	if (list_empty(&sem->wait_list))
> > > +		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
> > > +	else
> > > +		__rwsem_do_wake(sem, RWSEM_WAKE_READERS);
> > > +	raw_spin_unlock_irq(&sem->wait_lock);
> > > +
> > > +	return ERR_PTR(-EINTR);
> > >  }
> > 
> > Looks much better but don't we have to check the count for potentially
> > pending writers?
> 
> Ah, so I was thinking that if we get here, there must still be an owner,
> otherwise we'd have acquired the lock. And if there is an owner, we
> cannot go wake writers. Hence the WAKE_READERS thing.

I was worried about the case where the owner is writer and we would wake
readers but I have missed that this wouldn't happen because of

	if (wake_type != RWSEM_WAKE_READ_OWNED) {
		adjustment = RWSEM_ACTIVE_READ_BIAS;
 try_reader_grant:
		oldcount = rwsem_atomic_update(adjustment, sem) - adjustment;
		if (unlikely(oldcount < RWSEM_WAITING_BIAS)) {
			/* A writer stole the lock. Undo our reader grant. */
			if (rwsem_atomic_update(-adjustment, sem) &
						RWSEM_ACTIVE_MASK)
				goto out;
			/* Last active locker left. Retry waking readers. */
			goto try_reader_grant;
		}
	}

> Then again, WAKE_ANY would not harm, in that if we do wake a pending
> writer it will not proceed if it cannot and it'll go back to sleep
> again.

true

> So yeah, maybe WAKE_ANY is the prudent thing to do.

I guess so.

Care to cook up a full patch?

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  9:31                   ` Michal Hocko
@ 2016-05-11  9:41                     ` Peter Zijlstra
  2016-05-11 13:59                       ` Michal Hocko
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2016-05-11  9:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed, May 11, 2016 at 11:31:27AM +0200, Michal Hocko wrote:

> Care to cook up a full patch?

compile tested only, if someone could please test it?

I'll go write a Changelog if it turns out to actually work ;-)

---
 kernel/locking/rwsem-xadd.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index df4dcb883b50..09e30c6225e5 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -487,23 +487,32 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 
 		/* Block until there are no active lockers. */
 		do {
-			if (signal_pending_state(state, current)) {
-				raw_spin_lock_irq(&sem->wait_lock);
-				ret = ERR_PTR(-EINTR);
-				goto out;
-			}
+			if (signal_pending_state(state, current))
+				goto out_nolock;
+
 			schedule();
 			set_current_state(state);
 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
-out:
 	__set_current_state(TASK_RUNNING);
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
 
 	return ret;
+
+out_nolock:
+	__set_current_state(TASK_RUNNING);
+	raw_spin_lock_irq(&sem->wait_lock);
+	list_del(&waiter.list);
+	if (list_empty(&sem->wait_list))
+		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
+	else
+		__rwsem_do_wake(sem, RWSEM_WAKE_ANY);
+	raw_spin_unlock_irq(&sem->wait_lock);
+
+	return ERR_PTR(-EINTR);
 }
 
 __visible struct rw_semaphore * __sched

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11  9:41                     ` Peter Zijlstra
@ 2016-05-11 13:59                       ` Michal Hocko
  2016-05-11 18:03                         ` Michal Hocko
  0 siblings, 1 reply; 51+ messages in thread
From: Michal Hocko @ 2016-05-11 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed 11-05-16 11:41:28, Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 11:31:27AM +0200, Michal Hocko wrote:
> 
> > Care to cook up a full patch?
> 
> compile tested only, if someone could please test it?

I have tried to run the test case from Tetsuo[1] with a small printk to
show the interrupted writer case:
[ 2753.596678] XXX: Writer interrupted. Woken waiters:0
[ 2998.266978] XXX: Writer interrupted. Woken waiters:0

which means rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem) path which is
the problematic case. oom_reaper was always able to succeed so I guess
the patch works as expected. I will leave the test run for longer to be
sure.

[1] http://lkml.kernel.org/r/201605061958.HHG48967.JVFtSLFQOFOOMH@I-love.SAKURA.ne.jp
> 
> I'll go write a Changelog if it turns out to actually work ;-)
> 
> ---
>  kernel/locking/rwsem-xadd.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index df4dcb883b50..09e30c6225e5 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -487,23 +487,32 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
>  
>  		/* Block until there are no active lockers. */
>  		do {
> -			if (signal_pending_state(state, current)) {
> -				raw_spin_lock_irq(&sem->wait_lock);
> -				ret = ERR_PTR(-EINTR);
> -				goto out;
> -			}
> +			if (signal_pending_state(state, current))
> +				goto out_nolock;
> +
>  			schedule();
>  			set_current_state(state);
>  		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
>  
>  		raw_spin_lock_irq(&sem->wait_lock);
>  	}
> -out:
>  	__set_current_state(TASK_RUNNING);
>  	list_del(&waiter.list);
>  	raw_spin_unlock_irq(&sem->wait_lock);
>  
>  	return ret;
> +
> +out_nolock:
> +	__set_current_state(TASK_RUNNING);
> +	raw_spin_lock_irq(&sem->wait_lock);
> +	list_del(&waiter.list);
> +	if (list_empty(&sem->wait_list))
> +		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
> +	else
> +		__rwsem_do_wake(sem, RWSEM_WAKE_ANY);
> +	raw_spin_unlock_irq(&sem->wait_lock);
> +
> +	return ERR_PTR(-EINTR);
>  }
>  
>  __visible struct rw_semaphore * __sched

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11 13:59                       ` Michal Hocko
@ 2016-05-11 18:03                         ` Michal Hocko
  2016-05-12 11:57                           ` [PATCH] locking, rwsem: Fix down_write_killable() Peter Zijlstra
  2016-05-12 12:12                           ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Peter Zijlstra
  0 siblings, 2 replies; 51+ messages in thread
From: Michal Hocko @ 2016-05-11 18:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed 11-05-16 15:59:38, Michal Hocko wrote:
> On Wed 11-05-16 11:41:28, Peter Zijlstra wrote:
> > On Wed, May 11, 2016 at 11:31:27AM +0200, Michal Hocko wrote:
> > 
> > > Care to cook up a full patch?
> > 
> > compile tested only, if someone could please test it?
> 
> I have tried to run the test case from Tetsuo[1] with a small printk to
> show the interrupted writer case:
> [ 2753.596678] XXX: Writer interrupted. Woken waiters:0
> [ 2998.266978] XXX: Writer interrupted. Woken waiters:0
> 
> which means rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem) path which is
> the problematic case. oom_reaper was always able to succeed so I guess
> the patch works as expected. I will leave the test run for longer to be
> sure.

And just for the reference I am able to reproduce the lockup without the
patch applied and the same test case and a debugging patch

[ 1522.036379] XXX interrupted. list_is_singular:1
[ 1523.040462] oom_reaper: unable to reap pid:3736 (tgid=3736)

I still cannot say I would understand why the pending
RWSEM_WAITING_BIAS matters but I would probably need to look at the code
again with a clean head, __rwsem_wake is quite tricky...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH] locking, rwsem: Fix down_write_killable()
  2016-05-11 18:03                         ` Michal Hocko
@ 2016-05-12 11:57                           ` Peter Zijlstra
  2016-05-12 12:15                             ` [tip:locking/rwsem] locking/rwsem: " tip-bot for Peter Zijlstra
                                               ` (2 more replies)
  2016-05-12 12:12                           ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Peter Zijlstra
  1 sibling, 3 replies; 51+ messages in thread
From: Peter Zijlstra @ 2016-05-12 11:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed, May 11, 2016 at 08:03:46PM +0200, Michal Hocko wrote:
> On Wed 11-05-16 15:59:38, Michal Hocko wrote:
> > On Wed 11-05-16 11:41:28, Peter Zijlstra wrote:
> > > On Wed, May 11, 2016 at 11:31:27AM +0200, Michal Hocko wrote:
> > > 
> > > > Care to cook up a full patch?
> > > 
> > > compile tested only, if someone could please test it?
> > 
> > I have tried to run the test case from Tetsuo[1] with a small printk to
> > show the interrupted writer case:
> > [ 2753.596678] XXX: Writer interrupted. Woken waiters:0
> > [ 2998.266978] XXX: Writer interrupted. Woken waiters:0
> > 
> > which means rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem) path which is
> > the problematic case. oom_reaper was always able to succeed so I guess
> > the patch works as expected. I will leave the test run for longer to be
> > sure.
> 
> And just for the reference I am able to reproduce the lockup without the
> patch applied and the same test case and a debugging patch
> 
> [ 1522.036379] XXX interrupted. list_is_singular:1
> [ 1523.040462] oom_reaper: unable to reap pid:3736 (tgid=3736)

Here the complete patch. Ingo could you make it appear in the right tip
branch?


---
Subject: locking, rwsem: Fix down_write_killable()
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed, 11 May 2016 11:41:28 +0200

The new signal_pending exit path in __rwsem_down_write_failed_common()
was fingered as breaking his kernel by Tetsuo Handa.

Upon inspection it was found that there are two things wrong with it;

 - it forgets to remove WAITING_BIAS if it leaves the list empty, or
 - it forgets to wake further waiters that were blocked on the now
   removed waiter.

Especially the first issue causes new lock attempts to block and stall
indefinitely, as the code assumes that pending waiters mean there is
an owner that will wake when it releases the lock.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Waiman Long <Waiman.Long@hpe.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/locking/rwsem-xadd.c |   21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -487,23 +487,32 @@ __rwsem_down_write_failed_common(struct
 
 		/* Block until there are no active lockers. */
 		do {
-			if (signal_pending_state(state, current)) {
-				raw_spin_lock_irq(&sem->wait_lock);
-				ret = ERR_PTR(-EINTR);
-				goto out;
-			}
+			if (signal_pending_state(state, current))
+				goto out_nolock;
+
 			schedule();
 			set_current_state(state);
 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
-out:
 	__set_current_state(TASK_RUNNING);
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
 
 	return ret;
+
+out_nolock:
+	__set_current_state(TASK_RUNNING);
+	raw_spin_lock_irq(&sem->wait_lock);
+	list_del(&waiter.list);
+	if (list_empty(&sem->wait_list))
+		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
+	else
+		__rwsem_do_wake(sem, RWSEM_WAKE_ANY);
+	raw_spin_unlock_irq(&sem->wait_lock);
+
+	return ERR_PTR(-EINTR);
 }
 
 __visible struct rw_semaphore * __sched

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-11 18:03                         ` Michal Hocko
  2016-05-12 11:57                           ` [PATCH] locking, rwsem: Fix down_write_killable() Peter Zijlstra
@ 2016-05-12 12:12                           ` Peter Zijlstra
  2016-05-12 12:19                             ` Michal Hocko
  1 sibling, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2016-05-12 12:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Wed, May 11, 2016 at 08:03:46PM +0200, Michal Hocko wrote:
> I still cannot say I would understand why the pending
> RWSEM_WAITING_BIAS matters but I would probably need to look at the code
> again with a clean head, __rwsem_wake is quite tricky...

Ah, you're asking why an unconditional __rwsem_wake(ANY) isn't enough?

Because; if at that point there's nobody waiting, we're left with an
empty list and WAITER_BIAS set. This in turn will make all fast paths
fail.

Look at rwsem_down_read_failed() for instance; if we enter that we'll
unconditionally queue ourself, with nobody left to come wake us.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [tip:locking/rwsem] locking/rwsem: Fix down_write_killable()
  2016-05-12 11:57                           ` [PATCH] locking, rwsem: Fix down_write_killable() Peter Zijlstra
@ 2016-05-12 12:15                             ` tip-bot for Peter Zijlstra
  2016-05-12 16:59                             ` [PATCH] locking, rwsem: " Michal Hocko
  2016-05-15 20:57                             ` [tip:locking/rwsem] locking/rwsem: " tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 51+ messages in thread
From: tip-bot for Peter Zijlstra @ 2016-05-12 12:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, linux-kernel, torvalds, eranian, vincent.weaver, davem,
	alexander.shishkin, chris, hpa, mhocko, peterz, jolsa, tony.luck,
	dave, acme, akpm, penguin-kernel, Waiman.Long, jcmvbkbc, tglx

Commit-ID:  6d4acadc1ca37a3aca10892d32fb7205bc5b0cfc
Gitweb:     http://git.kernel.org/tip/6d4acadc1ca37a3aca10892d32fb7205bc5b0cfc
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 12 May 2016 13:57:45 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 12 May 2016 14:10:39 +0200

locking/rwsem: Fix down_write_killable()

The new signal_pending exit path in __rwsem_down_write_failed_common()
was fingered as breaking his kernel by Tetsuo Handa.

Upon inspection it was found that there are two things wrong with it;

 - it forgets to remove WAITING_BIAS if it leaves the list empty, or
 - it forgets to wake further waiters that were blocked on the now
   removed waiter.

Especially the first issue causes new lock attempts to block and stall
indefinitely, as the code assumes that pending waiters mean there is
an owner that will wake when it releases the lock.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Waiman Long <Waiman.Long@hpe.com>
Link: http://lkml.kernel.org/r/20160512115745.GP3192@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/rwsem-xadd.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index df4dcb8..09e30c6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -487,23 +487,32 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 
 		/* Block until there are no active lockers. */
 		do {
-			if (signal_pending_state(state, current)) {
-				raw_spin_lock_irq(&sem->wait_lock);
-				ret = ERR_PTR(-EINTR);
-				goto out;
-			}
+			if (signal_pending_state(state, current))
+				goto out_nolock;
+
 			schedule();
 			set_current_state(state);
 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
-out:
 	__set_current_state(TASK_RUNNING);
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
 
 	return ret;
+
+out_nolock:
+	__set_current_state(TASK_RUNNING);
+	raw_spin_lock_irq(&sem->wait_lock);
+	list_del(&waiter.list);
+	if (list_empty(&sem->wait_list))
+		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
+	else
+		__rwsem_do_wake(sem, RWSEM_WAKE_ANY);
+	raw_spin_unlock_irq(&sem->wait_lock);
+
+	return ERR_PTR(-EINTR);
 }
 
 __visible struct rw_semaphore * __sched

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-12 12:12                           ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Peter Zijlstra
@ 2016-05-12 12:19                             ` Michal Hocko
  2016-05-12 13:58                               ` Peter Zijlstra
  2016-05-12 19:42                               ` Waiman Long
  0 siblings, 2 replies; 51+ messages in thread
From: Michal Hocko @ 2016-05-12 12:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Thu 12-05-16 14:12:04, Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 08:03:46PM +0200, Michal Hocko wrote:
> > I still cannot say I would understand why the pending
> > RWSEM_WAITING_BIAS matters but I would probably need to look at the code
> > again with a clean head, __rwsem_wake is quite tricky...
> 
> Ah, you're asking why an unconditional __rwsem_wake(ANY) isn't enough?
> 
> Because; if at that point there's nobody waiting, we're left with an
> empty list and WAITER_BIAS set. This in turn will make all fast paths
> fail.
> 
> Look at rwsem_down_read_failed() for instance; if we enter that we'll
> unconditionally queue ourself, with nobody left to come wake us.

This is still not clear to me because rwsem_down_read_failed will call
__rwsem_do_wake if the count is RWSEM_WAITING_BIAS so we shouldn't go to
sleep and get the lock. So you are right that we would force everybody
to the slow path which is not great but shouldn't cause incorrect
behavior. I guess I must be missing something obvious here...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-12 12:19                             ` Michal Hocko
@ 2016-05-12 13:58                               ` Peter Zijlstra
  2016-05-12 19:42                               ` Waiman Long
  1 sibling, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2016-05-12 13:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Thu, May 12, 2016 at 02:19:07PM +0200, Michal Hocko wrote:
> On Thu 12-05-16 14:12:04, Peter Zijlstra wrote:
> > On Wed, May 11, 2016 at 08:03:46PM +0200, Michal Hocko wrote:
> > > I still cannot say I would understand why the pending
> > > RWSEM_WAITING_BIAS matters but I would probably need to look at the code
> > > again with a clean head, __rwsem_wake is quite tricky...
> > 
> > Ah, you're asking why an unconditional __rwsem_wake(ANY) isn't enough?
> > 
> > Because; if at that point there's nobody waiting, we're left with an
> > empty list and WAITER_BIAS set. This in turn will make all fast paths
> > fail.
> > 
> > Look at rwsem_down_read_failed() for instance; if we enter that we'll
> > unconditionally queue ourself, with nobody left to come wake us.
> 
> This is still not clear to me because rwsem_down_read_failed will call
> __rwsem_do_wake if the count is RWSEM_WAITING_BIAS so we shouldn't go to
> sleep and get the lock. So you are right that we would force everybody
> to the slow path which is not great but shouldn't cause incorrect
> behavior. I guess I must be missing something obvious here...


Ah me too; I missed the obvious: we do the __rwsem_do_wake() after we
add ourselves to the list, which means we'll also wake ourselves.

I'll have more thinking..

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] locking, rwsem: Fix down_write_killable()
  2016-05-12 11:57                           ` [PATCH] locking, rwsem: Fix down_write_killable() Peter Zijlstra
  2016-05-12 12:15                             ` [tip:locking/rwsem] locking/rwsem: " tip-bot for Peter Zijlstra
@ 2016-05-12 16:59                             ` Michal Hocko
  2016-05-15 20:57                             ` [tip:locking/rwsem] locking/rwsem: " tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2016-05-12 16:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, Davidlohr Bueso, Waiman Long

On Thu 12-05-16 13:57:45, Peter Zijlstra wrote:
[...]
> Subject: locking, rwsem: Fix down_write_killable()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed, 11 May 2016 11:41:28 +0200
> 
> The new signal_pending exit path in __rwsem_down_write_failed_common()
> was fingered as breaking his kernel by Tetsuo Handa.
> 
> Upon inspection it was found that there are two things wrong with it;
> 
>  - it forgets to remove WAITING_BIAS if it leaves the list empty, or
>  - it forgets to wake further waiters that were blocked on the now
>    removed waiter.
> 
> Especially the first issue causes new lock attempts to block and stall
> indefinitely, as the code assumes that pending waiters mean there is
> an owner that will wake when it releases the lock.

Just to prevent from confusion. I think that the patch is doing clearly
the proper thing. Even if the state WAITING_BIAS was OK for some reason
it would be too subtle and it is better to clean up the state when
failing. So the follow up discussion is just to clarify what the heck
was going on here.

> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Waiman Long <Waiman.Long@hpe.com>
> Cc: Chris Zankel <chris@zankel.net>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Max Filippov <jcmvbkbc@gmail.com>
> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Tested-by: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/locking/rwsem-xadd.c |   21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -487,23 +487,32 @@ __rwsem_down_write_failed_common(struct
>  
>  		/* Block until there are no active lockers. */
>  		do {
> -			if (signal_pending_state(state, current)) {
> -				raw_spin_lock_irq(&sem->wait_lock);
> -				ret = ERR_PTR(-EINTR);
> -				goto out;
> -			}
> +			if (signal_pending_state(state, current))
> +				goto out_nolock;
> +
>  			schedule();
>  			set_current_state(state);
>  		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
>  
>  		raw_spin_lock_irq(&sem->wait_lock);
>  	}
> -out:
>  	__set_current_state(TASK_RUNNING);
>  	list_del(&waiter.list);
>  	raw_spin_unlock_irq(&sem->wait_lock);
>  
>  	return ret;
> +
> +out_nolock:
> +	__set_current_state(TASK_RUNNING);
> +	raw_spin_lock_irq(&sem->wait_lock);
> +	list_del(&waiter.list);
> +	if (list_empty(&sem->wait_list))
> +		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
> +	else
> +		__rwsem_do_wake(sem, RWSEM_WAKE_ANY);
> +	raw_spin_unlock_irq(&sem->wait_lock);
> +
> +	return ERR_PTR(-EINTR);
>  }
>  
>  __visible struct rw_semaphore * __sched

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable
  2016-05-12 12:19                             ` Michal Hocko
  2016-05-12 13:58                               ` Peter Zijlstra
@ 2016-05-12 19:42                               ` Waiman Long
  1 sibling, 0 replies; 51+ messages in thread
From: Waiman Long @ 2016-05-12 19:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Peter Zijlstra, Tetsuo Handa, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, Davidlohr Bueso

On 05/12/2016 08:19 AM, Michal Hocko wrote:
> On Thu 12-05-16 14:12:04, Peter Zijlstra wrote:
>> On Wed, May 11, 2016 at 08:03:46PM +0200, Michal Hocko wrote:
>>> I still cannot say I would understand why the pending
>>> RWSEM_WAITING_BIAS matters but I would probably need to look at the code
>>> again with a clean head, __rwsem_wake is quite tricky...
>> Ah, you're asking why an unconditional __rwsem_wake(ANY) isn't enough?
>>
>> Because; if at that point there's nobody waiting, we're left with an
>> empty list and WAITER_BIAS set. This in turn will make all fast paths
>> fail.
>>
>> Look at rwsem_down_read_failed() for instance; if we enter that we'll
>> unconditionally queue ourself, with nobody left to come wake us.
> This is still not clear to me because rwsem_down_read_failed will call
> __rwsem_do_wake if the count is RWSEM_WAITING_BIAS so we shouldn't go to
> sleep and get the lock. So you are right that we would force everybody
> to the slow path which is not great but shouldn't cause incorrect
> behavior. I guess I must be missing something obvious here...

Because of writer lock stealing, having a count of RWSEM_WAITING_BIAS 
doesn't mean the reader can surely get the lock even if it is the first 
one in the queue. Calling __rwsem_do_wake() will take care of all the 
locking and queue checking work. Yes, I think it is a bit odd for the 
possibility that a task may wake up itself. Maybe we can add code like:

--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -202,7 +202,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
rwsem_wake_type wake_type)
           */
          smp_mb();
          waiter->task = NULL;
-        wake_up_process(tsk);
+        if (tsk != current)
+            wake_up_process(tsk);
          put_task_struct(tsk);
      } while (--loop);

Cheers,
Longman

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [tip:locking/rwsem] locking/rwsem: Fix down_write_killable()
  2016-05-12 11:57                           ` [PATCH] locking, rwsem: Fix down_write_killable() Peter Zijlstra
  2016-05-12 12:15                             ` [tip:locking/rwsem] locking/rwsem: " tip-bot for Peter Zijlstra
  2016-05-12 16:59                             ` [PATCH] locking, rwsem: " Michal Hocko
@ 2016-05-15 20:57                             ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 51+ messages in thread
From: tip-bot for Peter Zijlstra @ 2016-05-15 20:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jcmvbkbc, linux-kernel, penguin-kernel, acme, Waiman.Long,
	eranian, mingo, tony.luck, torvalds, mhocko, tglx,
	vincent.weaver, dave, jolsa, peterz, davem, alexander.shishkin,
	hpa, chris, akpm

Commit-ID:  04cafed7fc19a8010771c788708ac97c405fc3de
Gitweb:     http://git.kernel.org/tip/04cafed7fc19a8010771c788708ac97c405fc3de
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 12 May 2016 13:57:45 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 15 May 2016 22:55:00 +0200

locking/rwsem: Fix down_write_killable()

The new signal_pending exit path in __rwsem_down_write_failed_common()
was fingered as breaking his kernel by Tetsuo Handa.

Upon inspection it was found that there are two things wrong with it;

 - it forgets to remove WAITING_BIAS if it leaves the list empty, or
 - it forgets to wake further waiters that were blocked on the now
   removed waiter.

Especially the first issue causes new lock attempts to block and stall
indefinitely, as the code assumes that pending waiters mean there is
an owner that will wake when it releases the lock.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Waiman Long <Waiman.Long@hpe.com>
Link: http://lkml.kernel.org/r/20160512115745.GP3192@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/rwsem-xadd.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index df4dcb8..09e30c6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -487,23 +487,32 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 
 		/* Block until there are no active lockers. */
 		do {
-			if (signal_pending_state(state, current)) {
-				raw_spin_lock_irq(&sem->wait_lock);
-				ret = ERR_PTR(-EINTR);
-				goto out;
-			}
+			if (signal_pending_state(state, current))
+				goto out_nolock;
+
 			schedule();
 			set_current_state(state);
 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
-out:
 	__set_current_state(TASK_RUNNING);
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
 
 	return ret;
+
+out_nolock:
+	__set_current_state(TASK_RUNNING);
+	raw_spin_lock_irq(&sem->wait_lock);
+	list_del(&waiter.list);
+	if (list_empty(&sem->wait_list))
+		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
+	else
+		__rwsem_do_wake(sem, RWSEM_WAKE_ANY);
+	raw_spin_unlock_irq(&sem->wait_lock);
+
+	return ERR_PTR(-EINTR);
 }
 
 __visible struct rw_semaphore * __sched

^ permalink raw reply related	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2016-05-15 20:59 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-01 11:04 [PATCH 0/11] introduce down_write_killable for rw_semaphore v2 Michal Hocko
2016-04-01 11:04 ` [PATCH 01/11] locking, rwsem: get rid of __down_write_nested Michal Hocko
2016-04-02  0:28   ` Davidlohr Bueso
2016-04-01 11:04 ` [PATCH 02/11] locking, rwsem: drop explicit memory barriers Michal Hocko
2016-04-02  1:17   ` Davidlohr Bueso
2016-04-04  9:03     ` Michal Hocko
2016-04-04  9:06       ` [PATCH 1/2] xtensa, rwsem: drop superfluous arch specific implementation Michal Hocko
2016-04-04  9:06         ` [PATCH 2/2] sh, " Michal Hocko
2016-04-06  9:26           ` Peter Zijlstra
2016-04-06  9:50             ` Geert Uytterhoeven
2016-04-06 10:27               ` Peter Zijlstra
2016-04-04 10:23         ` [PATCH 1/2] xtensa, " Max Filippov
2016-04-06  9:06     ` [PATCH] sparc, " Michal Hocko
2016-04-01 11:04 ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Michal Hocko
2016-04-02  4:41   ` Davidlohr Bueso
2016-04-04  9:17     ` Michal Hocko
2016-04-04  9:21       ` Peter Zijlstra
2016-04-07  6:58       ` Davidlohr Bueso
2016-04-07  7:38         ` Michal Hocko
2016-05-10 10:43   ` Tetsuo Handa
2016-05-10 11:53     ` Michal Hocko
2016-05-10 12:38       ` Peter Zijlstra
2016-05-10 13:57         ` Tetsuo Handa
2016-05-11  7:23         ` Michal Hocko
2016-05-11  8:28           ` Michal Hocko
2016-05-11  8:44             ` Peter Zijlstra
2016-05-11  9:04               ` Michal Hocko
2016-05-11  9:17                 ` Peter Zijlstra
2016-05-11  9:31                   ` Michal Hocko
2016-05-11  9:41                     ` Peter Zijlstra
2016-05-11 13:59                       ` Michal Hocko
2016-05-11 18:03                         ` Michal Hocko
2016-05-12 11:57                           ` [PATCH] locking, rwsem: Fix down_write_killable() Peter Zijlstra
2016-05-12 12:15                             ` [tip:locking/rwsem] locking/rwsem: " tip-bot for Peter Zijlstra
2016-05-12 16:59                             ` [PATCH] locking, rwsem: " Michal Hocko
2016-05-15 20:57                             ` [tip:locking/rwsem] locking/rwsem: " tip-bot for Peter Zijlstra
2016-05-12 12:12                           ` [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Peter Zijlstra
2016-05-12 12:19                             ` Michal Hocko
2016-05-12 13:58                               ` Peter Zijlstra
2016-05-12 19:42                               ` Waiman Long
2016-05-11  8:35           ` Peter Zijlstra
2016-05-11  9:02             ` Michal Hocko
2016-04-01 11:04 ` [PATCH 04/11] alpha, rwsem: provide __down_write_killable Michal Hocko
2016-04-01 11:04 ` [PATCH 05/11] ia64, " Michal Hocko
2016-04-01 11:04 ` [PATCH 06/11] s390, " Michal Hocko
2016-04-01 11:04 ` [PATCH 07/11] sh, " Michal Hocko
2016-04-01 11:04 ` [PATCH 08/11] sparc, " Michal Hocko
2016-04-01 11:04 ` [PATCH 09/11] xtensa, " Michal Hocko
2016-04-01 11:04 ` [PATCH 10/11] x86, " Michal Hocko
2016-04-06 18:31   ` Peter Zijlstra
2016-04-01 11:04 ` [PATCH 11/11] locking, rwsem: provide down_write_killable Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).