All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/12] introduce down_write_killable for rw_semaphore
@ 2016-02-02 20:19 ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

Hi,
the following patchset implements a killable variant of write lock for
rw_semaphore. My usecase is to turn as many mmap_sem write users to use
a killable variant which will be helpful for the oom_reaper [1] to
asynchronously tear down the oom victim address space which requires
mmap_sem for read. This will reduce a likelihood of OOM livelocks caused
by oom victim being stuck on a lock or other resource which prevents it
to reach its exit path and release the memory. I haven't implemented
the killable variant of the read lock because I do not have any usecase
for this API.

The patchset is organized as follows.
- Patch 1 is a trivial cleanup
- Patch 2, I belive, shouldn't introduce any functional changes as per
  Documentation/memory-barriers.txt.
- Patch 3 is the preparatory work and necessary infrastructure for
  down_write_killable. It implements generic __down_write_killable
  and prepares the write lock slow path to bail out earlier when told so
- Patch 4-9 are implementing arch specific __down_write_killable. One
  patch per architecture. I haven't even tried to compile test anything but
  sparch which uses CONFIG_RWSEM_GENERIC_SPINLOCK in allnoconfig.
  Those shold be mostly trivial.
- One exception is x86 which replaces the current implementation of
  __down_write with the generic one to make easier to read and get rid
  of one level of indirection to the slow path. More on that in patch 10.
  I do not have any problems to drop patch 10 and rework 11 to the current
  inline asm but I think the easier code would be better.
- finally patch 11 implements down_write_killable and ties everything
  together. I am not really an expert on lockdep so I hope I got it right.

Many of arch specific patches are basically same and I can squash them
into one patch if this is preferred but I thought that one patch per
arch is preferable.

My patch to change mmap_sem write users to killable form is not part
of the series because it is not finished yet but I guess it is not
really necessary for the RFC. The API is used in the same way as
mutex_lock_killable.

I have tested on x86 with OOM situations with high mmap_sem contention
(basically many parallel page faults racing with many parallel mmap/munmap
tight loops) so the waiters for the write locks are routinely interrupted
by SIGKILL.

Patches should apply cleanly on both Linus and next tree.

Any feedback is highly appreciated.
---
[1] http://lkml.kernel.org/r/1452094975-551-1-git-send-email-mhocko@kernel.org


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [RFC 0/12] introduce down_write_killable for rw_semaphore
@ 2016-02-02 20:19 ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

Hi,
the following patchset implements a killable variant of write lock for
rw_semaphore. My usecase is to turn as many mmap_sem write users to use
a killable variant which will be helpful for the oom_reaper [1] to
asynchronously tear down the oom victim address space which requires
mmap_sem for read. This will reduce a likelihood of OOM livelocks caused
by oom victim being stuck on a lock or other resource which prevents it
to reach its exit path and release the memory. I haven't implemented
the killable variant of the read lock because I do not have any usecase
for this API.

The patchset is organized as follows.
- Patch 1 is a trivial cleanup
- Patch 2, I belive, shouldn't introduce any functional changes as per
  Documentation/memory-barriers.txt.
- Patch 3 is the preparatory work and necessary infrastructure for
  down_write_killable. It implements generic __down_write_killable
  and prepares the write lock slow path to bail out earlier when told so
- Patch 4-9 are implementing arch specific __down_write_killable. One
  patch per architecture. I haven't even tried to compile test anything but
  sparch which uses CONFIG_RWSEM_GENERIC_SPINLOCK in allnoconfig.
  Those shold be mostly trivial.
- One exception is x86 which replaces the current implementation of
  __down_write with the generic one to make easier to read and get rid
  of one level of indirection to the slow path. More on that in patch 10.
  I do not have any problems to drop patch 10 and rework 11 to the current
  inline asm but I think the easier code would be better.
- finally patch 11 implements down_write_killable and ties everything
  together. I am not really an expert on lockdep so I hope I got it right.

Many of arch specific patches are basically same and I can squash them
into one patch if this is preferred but I thought that one patch per
arch is preferable.

My patch to change mmap_sem write users to killable form is not part
of the series because it is not finished yet but I guess it is not
really necessary for the RFC. The API is used in the same way as
mutex_lock_killable.

I have tested on x86 with OOM situations with high mmap_sem contention
(basically many parallel page faults racing with many parallel mmap/munmap
tight loops) so the waiters for the write locks are routinely interrupted
by SIGKILL.

Patches should apply cleanly on both Linus and next tree.

Any feedback is highly appreciated.
---
[1] http://lkml.kernel.org/r/1452094975-551-1-git-send-email-mhocko@kernel.org

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [RFC 01/12] locking, rwsem: get rid of __down_write_nested
  2016-02-02 20:19 ` Michal Hocko
@ 2016-02-02 20:19   ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

This is no longer used anywhere and all callers (__down_write) use
0 as a subclass. Ditch __down_write_nested to make the code easier
to follow.

This shouldn't introduce any functional change.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/include/asm/rwsem.h   | 7 +------
 arch/sh/include/asm/rwsem.h     | 5 -----
 arch/sparc/include/asm/rwsem.h  | 7 +------
 arch/x86/include/asm/rwsem.h    | 7 +------
 include/asm-generic/rwsem.h     | 7 +------
 include/linux/rwsem-spinlock.h  | 1 -
 kernel/locking/rwsem-spinlock.c | 7 +------
 7 files changed, 5 insertions(+), 36 deletions(-)

diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
index 4b43ee7e6776..8e52e72f3efc 100644
--- a/arch/s390/include/asm/rwsem.h
+++ b/arch/s390/include/asm/rwsem.h
@@ -90,7 +90,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	signed long old, new, tmp;
 
@@ -108,11 +108,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		rwsem_down_write_failed(sem);
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
index edab57265293..a5104bebd1eb 100644
--- a/arch/sh/include/asm/rwsem.h
+++ b/arch/sh/include/asm/rwsem.h
@@ -114,11 +114,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 		rwsem_downgrade_wake(sem);
 }
 
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
-{
-	__down_write(sem);
-}
-
 /*
  * implement exchange and add functionality
  */
diff --git a/arch/sparc/include/asm/rwsem.h b/arch/sparc/include/asm/rwsem.h
index 069bf4d663a1..e5a0d575bc7f 100644
--- a/arch/sparc/include/asm/rwsem.h
+++ b/arch/sparc/include/asm/rwsem.h
@@ -45,7 +45,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	long tmp;
 
@@ -55,11 +55,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		rwsem_down_write_failed(sem);
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index cad82c9c2fde..d79a218675bc 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -99,7 +99,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	long tmp;
 	asm volatile("# beginning down_write\n\t"
@@ -116,11 +116,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		     : "memory", "cc");
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index d6d5dc98d7da..b8d8a6cf4ca8 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -53,7 +53,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	long tmp;
 
@@ -63,11 +63,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		rwsem_down_write_failed(sem);
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
diff --git a/include/linux/rwsem-spinlock.h b/include/linux/rwsem-spinlock.h
index 561e8615528d..a733a5467e6c 100644
--- a/include/linux/rwsem-spinlock.h
+++ b/include/linux/rwsem-spinlock.h
@@ -34,7 +34,6 @@ struct rw_semaphore {
 extern void __down_read(struct rw_semaphore *sem);
 extern int __down_read_trylock(struct rw_semaphore *sem);
 extern void __down_write(struct rw_semaphore *sem);
-extern void __down_write_nested(struct rw_semaphore *sem, int subclass);
 extern int __down_write_trylock(struct rw_semaphore *sem);
 extern void __up_read(struct rw_semaphore *sem);
 extern void __up_write(struct rw_semaphore *sem);
diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index 3a5048572065..bab26104a5d0 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -191,7 +191,7 @@ int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * get a write lock on the semaphore
  */
-void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
+void __sched __down_write(struct rw_semaphore *sem)
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
@@ -227,11 +227,6 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 }
 
-void __sched __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 01/12] locking, rwsem: get rid of __down_write_nested
@ 2016-02-02 20:19   ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

This is no longer used anywhere and all callers (__down_write) use
0 as a subclass. Ditch __down_write_nested to make the code easier
to follow.

This shouldn't introduce any functional change.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/include/asm/rwsem.h   | 7 +------
 arch/sh/include/asm/rwsem.h     | 5 -----
 arch/sparc/include/asm/rwsem.h  | 7 +------
 arch/x86/include/asm/rwsem.h    | 7 +------
 include/asm-generic/rwsem.h     | 7 +------
 include/linux/rwsem-spinlock.h  | 1 -
 kernel/locking/rwsem-spinlock.c | 7 +------
 7 files changed, 5 insertions(+), 36 deletions(-)

diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
index 4b43ee7e6776..8e52e72f3efc 100644
--- a/arch/s390/include/asm/rwsem.h
+++ b/arch/s390/include/asm/rwsem.h
@@ -90,7 +90,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	signed long old, new, tmp;
 
@@ -108,11 +108,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		rwsem_down_write_failed(sem);
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
index edab57265293..a5104bebd1eb 100644
--- a/arch/sh/include/asm/rwsem.h
+++ b/arch/sh/include/asm/rwsem.h
@@ -114,11 +114,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 		rwsem_downgrade_wake(sem);
 }
 
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
-{
-	__down_write(sem);
-}
-
 /*
  * implement exchange and add functionality
  */
diff --git a/arch/sparc/include/asm/rwsem.h b/arch/sparc/include/asm/rwsem.h
index 069bf4d663a1..e5a0d575bc7f 100644
--- a/arch/sparc/include/asm/rwsem.h
+++ b/arch/sparc/include/asm/rwsem.h
@@ -45,7 +45,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	long tmp;
 
@@ -55,11 +55,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		rwsem_down_write_failed(sem);
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index cad82c9c2fde..d79a218675bc 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -99,7 +99,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	long tmp;
 	asm volatile("# beginning down_write\n\t"
@@ -116,11 +116,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		     : "memory", "cc");
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index d6d5dc98d7da..b8d8a6cf4ca8 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -53,7 +53,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
+static inline void __down_write(struct rw_semaphore *sem)
 {
 	long tmp;
 
@@ -63,11 +63,6 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 		rwsem_down_write_failed(sem);
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
diff --git a/include/linux/rwsem-spinlock.h b/include/linux/rwsem-spinlock.h
index 561e8615528d..a733a5467e6c 100644
--- a/include/linux/rwsem-spinlock.h
+++ b/include/linux/rwsem-spinlock.h
@@ -34,7 +34,6 @@ struct rw_semaphore {
 extern void __down_read(struct rw_semaphore *sem);
 extern int __down_read_trylock(struct rw_semaphore *sem);
 extern void __down_write(struct rw_semaphore *sem);
-extern void __down_write_nested(struct rw_semaphore *sem, int subclass);
 extern int __down_write_trylock(struct rw_semaphore *sem);
 extern void __up_read(struct rw_semaphore *sem);
 extern void __up_write(struct rw_semaphore *sem);
diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index 3a5048572065..bab26104a5d0 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -191,7 +191,7 @@ int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * get a write lock on the semaphore
  */
-void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
+void __sched __down_write(struct rw_semaphore *sem)
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
@@ -227,11 +227,6 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 }
 
-void __sched __down_write(struct rw_semaphore *sem)
-{
-	__down_write_nested(sem, 0);
-}
-
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 02/12] locking, rwsem: drop explicit memory barriers
  2016-02-02 20:19 ` Michal Hocko
@ 2016-02-02 20:19   ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

sh and xtensa seem to be the only architectures which use explicit
memory barriers for rw_semaphore operations even though they are not
really needed because there is the full memory barrier is always implied
by atomic_{inc,dec,add,sub}_return resp. cmpxchg. Remove them.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/sh/include/asm/rwsem.h     | 14 ++------------
 arch/xtensa/include/asm/rwsem.h | 14 ++------------
 2 files changed, 4 insertions(+), 24 deletions(-)

diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
index a5104bebd1eb..f6c951c7a875 100644
--- a/arch/sh/include/asm/rwsem.h
+++ b/arch/sh/include/asm/rwsem.h
@@ -24,9 +24,7 @@
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	if (atomic_inc_return((atomic_t *)(&sem->count)) > 0)
-		smp_wmb();
-	else
+	if (atomic_inc_return((atomic_t *)(&sem->count)) <= 0)
 		rwsem_down_read_failed(sem);
 }
 
@@ -37,7 +35,6 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	while ((tmp = sem->count) >= 0) {
 		if (tmp = cmpxchg(&sem->count, tmp,
 				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
-			smp_wmb();
 			return 1;
 		}
 	}
@@ -53,9 +50,7 @@ static inline void __down_write(struct rw_semaphore *sem)
 
 	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
 				(atomic_t *)(&sem->count));
-	if (tmp = RWSEM_ACTIVE_WRITE_BIAS)
-		smp_wmb();
-	else
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
 		rwsem_down_write_failed(sem);
 }
 
@@ -65,7 +60,6 @@ static inline int __down_write_trylock(struct rw_semaphore *sem)
 
 	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
 		      RWSEM_ACTIVE_WRITE_BIAS);
-	smp_wmb();
 	return tmp = RWSEM_UNLOCKED_VALUE;
 }
 
@@ -76,7 +70,6 @@ static inline void __up_read(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_dec_return((atomic_t *)(&sem->count));
 	if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) = 0)
 		rwsem_wake(sem);
@@ -87,7 +80,6 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
-	smp_wmb();
 	if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
 			      (atomic_t *)(&sem->count)) < 0)
 		rwsem_wake(sem);
@@ -108,7 +100,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
 	if (tmp < 0)
 		rwsem_downgrade_wake(sem);
@@ -119,7 +110,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
  */
 static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
 {
-	smp_mb();
 	return atomic_add_return(delta, (atomic_t *)(&sem->count));
 }
 
diff --git a/arch/xtensa/include/asm/rwsem.h b/arch/xtensa/include/asm/rwsem.h
index 249619e7e7f2..593483f6e1ff 100644
--- a/arch/xtensa/include/asm/rwsem.h
+++ b/arch/xtensa/include/asm/rwsem.h
@@ -29,9 +29,7 @@
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	if (atomic_add_return(1,(atomic_t *)(&sem->count)) > 0)
-		smp_wmb();
-	else
+	if (atomic_add_return(1,(atomic_t *)(&sem->count)) <= 0)
 		rwsem_down_read_failed(sem);
 }
 
@@ -42,7 +40,6 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	while ((tmp = sem->count) >= 0) {
 		if (tmp = cmpxchg(&sem->count, tmp,
 				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
-			smp_wmb();
 			return 1;
 		}
 	}
@@ -58,9 +55,7 @@ static inline void __down_write(struct rw_semaphore *sem)
 
 	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
 				(atomic_t *)(&sem->count));
-	if (tmp = RWSEM_ACTIVE_WRITE_BIAS)
-		smp_wmb();
-	else
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
 		rwsem_down_write_failed(sem);
 }
 
@@ -70,7 +65,6 @@ static inline int __down_write_trylock(struct rw_semaphore *sem)
 
 	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
 		      RWSEM_ACTIVE_WRITE_BIAS);
-	smp_wmb();
 	return tmp = RWSEM_UNLOCKED_VALUE;
 }
 
@@ -81,7 +75,6 @@ static inline void __up_read(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_sub_return(1,(atomic_t *)(&sem->count));
 	if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) = 0)
 		rwsem_wake(sem);
@@ -92,7 +85,6 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
-	smp_wmb();
 	if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
 			      (atomic_t *)(&sem->count)) < 0)
 		rwsem_wake(sem);
@@ -113,7 +105,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
 	if (tmp < 0)
 		rwsem_downgrade_wake(sem);
@@ -124,7 +115,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
  */
 static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
 {
-	smp_mb();
 	return atomic_add_return(delta, (atomic_t *)(&sem->count));
 }
 
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 02/12] locking, rwsem: drop explicit memory barriers
@ 2016-02-02 20:19   ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

sh and xtensa seem to be the only architectures which use explicit
memory barriers for rw_semaphore operations even though they are not
really needed because there is the full memory barrier is always implied
by atomic_{inc,dec,add,sub}_return resp. cmpxchg. Remove them.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/sh/include/asm/rwsem.h     | 14 ++------------
 arch/xtensa/include/asm/rwsem.h | 14 ++------------
 2 files changed, 4 insertions(+), 24 deletions(-)

diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
index a5104bebd1eb..f6c951c7a875 100644
--- a/arch/sh/include/asm/rwsem.h
+++ b/arch/sh/include/asm/rwsem.h
@@ -24,9 +24,7 @@
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	if (atomic_inc_return((atomic_t *)(&sem->count)) > 0)
-		smp_wmb();
-	else
+	if (atomic_inc_return((atomic_t *)(&sem->count)) <= 0)
 		rwsem_down_read_failed(sem);
 }
 
@@ -37,7 +35,6 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	while ((tmp = sem->count) >= 0) {
 		if (tmp == cmpxchg(&sem->count, tmp,
 				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
-			smp_wmb();
 			return 1;
 		}
 	}
@@ -53,9 +50,7 @@ static inline void __down_write(struct rw_semaphore *sem)
 
 	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
 				(atomic_t *)(&sem->count));
-	if (tmp == RWSEM_ACTIVE_WRITE_BIAS)
-		smp_wmb();
-	else
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
 		rwsem_down_write_failed(sem);
 }
 
@@ -65,7 +60,6 @@ static inline int __down_write_trylock(struct rw_semaphore *sem)
 
 	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
 		      RWSEM_ACTIVE_WRITE_BIAS);
-	smp_wmb();
 	return tmp == RWSEM_UNLOCKED_VALUE;
 }
 
@@ -76,7 +70,6 @@ static inline void __up_read(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_dec_return((atomic_t *)(&sem->count));
 	if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)
 		rwsem_wake(sem);
@@ -87,7 +80,6 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
-	smp_wmb();
 	if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
 			      (atomic_t *)(&sem->count)) < 0)
 		rwsem_wake(sem);
@@ -108,7 +100,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
 	if (tmp < 0)
 		rwsem_downgrade_wake(sem);
@@ -119,7 +110,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
  */
 static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
 {
-	smp_mb();
 	return atomic_add_return(delta, (atomic_t *)(&sem->count));
 }
 
diff --git a/arch/xtensa/include/asm/rwsem.h b/arch/xtensa/include/asm/rwsem.h
index 249619e7e7f2..593483f6e1ff 100644
--- a/arch/xtensa/include/asm/rwsem.h
+++ b/arch/xtensa/include/asm/rwsem.h
@@ -29,9 +29,7 @@
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	if (atomic_add_return(1,(atomic_t *)(&sem->count)) > 0)
-		smp_wmb();
-	else
+	if (atomic_add_return(1,(atomic_t *)(&sem->count)) <= 0)
 		rwsem_down_read_failed(sem);
 }
 
@@ -42,7 +40,6 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	while ((tmp = sem->count) >= 0) {
 		if (tmp == cmpxchg(&sem->count, tmp,
 				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
-			smp_wmb();
 			return 1;
 		}
 	}
@@ -58,9 +55,7 @@ static inline void __down_write(struct rw_semaphore *sem)
 
 	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
 				(atomic_t *)(&sem->count));
-	if (tmp == RWSEM_ACTIVE_WRITE_BIAS)
-		smp_wmb();
-	else
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
 		rwsem_down_write_failed(sem);
 }
 
@@ -70,7 +65,6 @@ static inline int __down_write_trylock(struct rw_semaphore *sem)
 
 	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
 		      RWSEM_ACTIVE_WRITE_BIAS);
-	smp_wmb();
 	return tmp == RWSEM_UNLOCKED_VALUE;
 }
 
@@ -81,7 +75,6 @@ static inline void __up_read(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_sub_return(1,(atomic_t *)(&sem->count));
 	if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)
 		rwsem_wake(sem);
@@ -92,7 +85,6 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
-	smp_wmb();
 	if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
 			      (atomic_t *)(&sem->count)) < 0)
 		rwsem_wake(sem);
@@ -113,7 +105,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 {
 	int tmp;
 
-	smp_wmb();
 	tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
 	if (tmp < 0)
 		rwsem_downgrade_wake(sem);
@@ -124,7 +115,6 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
  */
 static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
 {
-	smp_mb();
 	return atomic_add_return(delta, (atomic_t *)(&sem->count));
 }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 03/12] locking, rwsem: introduce basis for down_write_killable
  2016-02-02 20:19 ` Michal Hocko
@ 2016-02-02 20:19   ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce a generic implementation necessary for down_write_killable.
This is a trivial extension of the already existing down_write call
which can be interrupted by SIGKILL.  This patch doesn't provide
down_write_killable yet because arches have to provide the necessary
pieces before.

rwsem_down_write_failed which is a generic slow path for the
write lock is extended to allow a task state and renamed to
__rwsem_down_write_failed_state. The return value is either a valid
semaphore pointer or ERR_PTR(-EINTR).

rwsem_down_write_failed_killable is exported as a new way to wait for
the lock and be killable.

For rwsem-spinlock implementation the current __down_write it updated
in a similar way as __rwsem_down_write_failed_state except it doesn't
need new exports just visible __down_write_killable.

Architectures which are not using the generic rwsem implementation are
supposed to provide their __down_write_killable implementation and
use rwsem_down_write_failed_killable for the slow path.

Signed-off-by: Michal Hocko <mhocko@suse.com>

fold me "locking, rwsem: introduce basis for down_write_killable"
---
 include/asm-generic/rwsem.h     | 13 +++++++++++++
 include/linux/rwsem-spinlock.h  |  1 +
 include/linux/rwsem.h           |  2 ++
 kernel/locking/rwsem-spinlock.c | 23 +++++++++++++++++++++--
 kernel/locking/rwsem-xadd.c     | 31 +++++++++++++++++++++++++------
 5 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index b8d8a6cf4ca8..b3f3aebbb994 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -63,6 +63,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	long tmp;
+	int ret;
+
+	tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS,
+				     (atomic_long_t *)&sem->count);
+	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
diff --git a/include/linux/rwsem-spinlock.h b/include/linux/rwsem-spinlock.h
index a733a5467e6c..ae0528b834cd 100644
--- a/include/linux/rwsem-spinlock.h
+++ b/include/linux/rwsem-spinlock.h
@@ -34,6 +34,7 @@ struct rw_semaphore {
 extern void __down_read(struct rw_semaphore *sem);
 extern int __down_read_trylock(struct rw_semaphore *sem);
 extern void __down_write(struct rw_semaphore *sem);
+extern int __must_check __down_write_killable(struct rw_semaphore *sem);
 extern int __down_write_trylock(struct rw_semaphore *sem);
 extern void __up_read(struct rw_semaphore *sem);
 extern void __up_write(struct rw_semaphore *sem);
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 8f498cdde280..7d7ae029dac5 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -14,6 +14,7 @@
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/atomic.h>
+#include <linux/err.h>
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 #include <linux/osq_lock.h>
 #endif
@@ -43,6 +44,7 @@ struct rw_semaphore {
 
 extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index bab26104a5d0..d1d04ca10d0e 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -191,11 +191,12 @@ int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * get a write lock on the semaphore
  */
-void __sched __down_write(struct rw_semaphore *sem)
+int __sched __down_write_state(struct rw_semaphore *sem, int state)
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
 	unsigned long flags;
+	int ret = 0;
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
@@ -215,16 +216,34 @@ void __sched __down_write(struct rw_semaphore *sem)
 		 */
 		if (sem->count = 0)
 			break;
-		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		set_task_state(tsk, state);
 		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 		schedule();
+		if (signal_pending_state(state, current)) {
+			ret = -EINTR;
+			raw_spin_lock_irqsave(&sem->wait_lock, flags);
+			goto out;
+		}
 		raw_spin_lock_irqsave(&sem->wait_lock, flags);
 	}
 	/* got the lock */
 	sem->count = -1;
+out:
 	list_del(&waiter.list);
 
 	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+
+	return ret;
+}
+
+void __sched __down_write(struct rw_semaphore *sem)
+{
+	__down_write_state(sem, TASK_UNINTERRUPTIBLE);
+}
+
+int __sched __down_write_killable(struct rw_semaphore *sem)
+{
+	return __down_write_state(sem, TASK_KILLABLE);
 }
 
 /*
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index a4d4de05b2d1..5cec34f1ad6f 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -433,12 +433,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 /*
  * Wait until we successfully acquire the write lock
  */
-__visible
-struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
+static inline struct rw_semaphore *
+__rwsem_down_write_failed_state(struct rw_semaphore *sem, int state)
 {
 	long count;
 	bool waiting = true; /* any queued threads before us */
 	struct rwsem_waiter waiter;
+	struct rw_semaphore *ret = sem;
 
 	/* undo write bias from down_write operation, stop active locking */
 	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
@@ -478,7 +479,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
 
 	/* wait until we successfully acquire the lock */
-	set_current_state(TASK_UNINTERRUPTIBLE);
+	set_current_state(state);
 	while (true) {
 		if (rwsem_try_write_lock(count, sem))
 			break;
@@ -487,20 +488,38 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 		/* Block until there are no active lockers. */
 		do {
 			schedule();
-			set_current_state(TASK_UNINTERRUPTIBLE);
+			if (signal_pending_state(state, current)) {
+				raw_spin_lock_irq(&sem->wait_lock);
+				ret = ERR_PTR(-EINTR);
+				goto out;
+			}
+			set_current_state(state);
 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
 	__set_current_state(TASK_RUNNING);
-
+out:
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
 
-	return sem;
+	return ret;
+}
+
+__visible struct rw_semaphore * __sched
+rwsem_down_write_failed(struct rw_semaphore *sem)
+{
+	return __rwsem_down_write_failed_state(sem, TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(rwsem_down_write_failed);
 
+__visible struct rw_semaphore * __sched
+rwsem_down_write_failed_killable(struct rw_semaphore *sem)
+{
+	return __rwsem_down_write_failed_state(sem, TASK_KILLABLE);
+}
+EXPORT_SYMBOL(rwsem_down_write_failed_killable);
+
 /*
  * handle waking up a waiter on the semaphore
  * - up_read/up_write has decremented the active part of count if we come here
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 03/12] locking, rwsem: introduce basis for down_write_killable
@ 2016-02-02 20:19   ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce a generic implementation necessary for down_write_killable.
This is a trivial extension of the already existing down_write call
which can be interrupted by SIGKILL.  This patch doesn't provide
down_write_killable yet because arches have to provide the necessary
pieces before.

rwsem_down_write_failed which is a generic slow path for the
write lock is extended to allow a task state and renamed to
__rwsem_down_write_failed_state. The return value is either a valid
semaphore pointer or ERR_PTR(-EINTR).

rwsem_down_write_failed_killable is exported as a new way to wait for
the lock and be killable.

For rwsem-spinlock implementation the current __down_write it updated
in a similar way as __rwsem_down_write_failed_state except it doesn't
need new exports just visible __down_write_killable.

Architectures which are not using the generic rwsem implementation are
supposed to provide their __down_write_killable implementation and
use rwsem_down_write_failed_killable for the slow path.

Signed-off-by: Michal Hocko <mhocko@suse.com>

fold me "locking, rwsem: introduce basis for down_write_killable"
---
 include/asm-generic/rwsem.h     | 13 +++++++++++++
 include/linux/rwsem-spinlock.h  |  1 +
 include/linux/rwsem.h           |  2 ++
 kernel/locking/rwsem-spinlock.c | 23 +++++++++++++++++++++--
 kernel/locking/rwsem-xadd.c     | 31 +++++++++++++++++++++++++------
 5 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index b8d8a6cf4ca8..b3f3aebbb994 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -63,6 +63,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	long tmp;
+	int ret;
+
+	tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS,
+				     (atomic_long_t *)&sem->count);
+	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
diff --git a/include/linux/rwsem-spinlock.h b/include/linux/rwsem-spinlock.h
index a733a5467e6c..ae0528b834cd 100644
--- a/include/linux/rwsem-spinlock.h
+++ b/include/linux/rwsem-spinlock.h
@@ -34,6 +34,7 @@ struct rw_semaphore {
 extern void __down_read(struct rw_semaphore *sem);
 extern int __down_read_trylock(struct rw_semaphore *sem);
 extern void __down_write(struct rw_semaphore *sem);
+extern int __must_check __down_write_killable(struct rw_semaphore *sem);
 extern int __down_write_trylock(struct rw_semaphore *sem);
 extern void __up_read(struct rw_semaphore *sem);
 extern void __up_write(struct rw_semaphore *sem);
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 8f498cdde280..7d7ae029dac5 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -14,6 +14,7 @@
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/atomic.h>
+#include <linux/err.h>
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 #include <linux/osq_lock.h>
 #endif
@@ -43,6 +44,7 @@ struct rw_semaphore {
 
 extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index bab26104a5d0..d1d04ca10d0e 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -191,11 +191,12 @@ int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * get a write lock on the semaphore
  */
-void __sched __down_write(struct rw_semaphore *sem)
+int __sched __down_write_state(struct rw_semaphore *sem, int state)
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
 	unsigned long flags;
+	int ret = 0;
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
@@ -215,16 +216,34 @@ void __sched __down_write(struct rw_semaphore *sem)
 		 */
 		if (sem->count == 0)
 			break;
-		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		set_task_state(tsk, state);
 		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 		schedule();
+		if (signal_pending_state(state, current)) {
+			ret = -EINTR;
+			raw_spin_lock_irqsave(&sem->wait_lock, flags);
+			goto out;
+		}
 		raw_spin_lock_irqsave(&sem->wait_lock, flags);
 	}
 	/* got the lock */
 	sem->count = -1;
+out:
 	list_del(&waiter.list);
 
 	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+
+	return ret;
+}
+
+void __sched __down_write(struct rw_semaphore *sem)
+{
+	__down_write_state(sem, TASK_UNINTERRUPTIBLE);
+}
+
+int __sched __down_write_killable(struct rw_semaphore *sem)
+{
+	return __down_write_state(sem, TASK_KILLABLE);
 }
 
 /*
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index a4d4de05b2d1..5cec34f1ad6f 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -433,12 +433,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 /*
  * Wait until we successfully acquire the write lock
  */
-__visible
-struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
+static inline struct rw_semaphore *
+__rwsem_down_write_failed_state(struct rw_semaphore *sem, int state)
 {
 	long count;
 	bool waiting = true; /* any queued threads before us */
 	struct rwsem_waiter waiter;
+	struct rw_semaphore *ret = sem;
 
 	/* undo write bias from down_write operation, stop active locking */
 	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
@@ -478,7 +479,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
 
 	/* wait until we successfully acquire the lock */
-	set_current_state(TASK_UNINTERRUPTIBLE);
+	set_current_state(state);
 	while (true) {
 		if (rwsem_try_write_lock(count, sem))
 			break;
@@ -487,20 +488,38 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 		/* Block until there are no active lockers. */
 		do {
 			schedule();
-			set_current_state(TASK_UNINTERRUPTIBLE);
+			if (signal_pending_state(state, current)) {
+				raw_spin_lock_irq(&sem->wait_lock);
+				ret = ERR_PTR(-EINTR);
+				goto out;
+			}
+			set_current_state(state);
 		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
 	__set_current_state(TASK_RUNNING);
-
+out:
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
 
-	return sem;
+	return ret;
+}
+
+__visible struct rw_semaphore * __sched
+rwsem_down_write_failed(struct rw_semaphore *sem)
+{
+	return __rwsem_down_write_failed_state(sem, TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(rwsem_down_write_failed);
 
+__visible struct rw_semaphore * __sched
+rwsem_down_write_failed_killable(struct rw_semaphore *sem)
+{
+	return __rwsem_down_write_failed_state(sem, TASK_KILLABLE);
+}
+EXPORT_SYMBOL(rwsem_down_write_failed_killable);
+
 /*
  * handle waking up a waiter on the semaphore
  * - up_read/up_write has decremented the active part of count if we come here
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 04/12] alpha, rwsem: provide __down_write_killable
  2016-02-02 20:19 ` Michal Hocko
@ 2016-02-02 20:19   ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce ___down_write for the fast path and reuse it for __down_write
resp. __down_write_killable each using the respective generic slow path
(rwsem_down_write_failed resp. rwsem_down_write_failed_killable).

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/alpha/include/asm/rwsem.h | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h
index a83bbea62c67..0131a7058778 100644
--- a/arch/alpha/include/asm/rwsem.h
+++ b/arch/alpha/include/asm/rwsem.h
@@ -63,7 +63,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	return res >= 0 ? 1 : 0;
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
+static inline long ___down_write(struct rw_semaphore *sem)
 {
 	long oldcount;
 #ifndef	CONFIG_SMP
@@ -83,10 +83,24 @@ static inline void __down_write(struct rw_semaphore *sem)
 	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
 	:"Ir" (RWSEM_ACTIVE_WRITE_BIAS), "m" (sem->count) : "memory");
 #endif
-	if (unlikely(oldcount))
+	return oldcount;
+}
+
+static inline void __down_write(struct rw_semaphore *sem)
+{
+	if (unlikely(___down_write(sem)))
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (unlikely(___down_write(sem)))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 04/12] alpha, rwsem: provide __down_write_killable
@ 2016-02-02 20:19   ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce ___down_write for the fast path and reuse it for __down_write
resp. __down_write_killable each using the respective generic slow path
(rwsem_down_write_failed resp. rwsem_down_write_failed_killable).

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/alpha/include/asm/rwsem.h | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h
index a83bbea62c67..0131a7058778 100644
--- a/arch/alpha/include/asm/rwsem.h
+++ b/arch/alpha/include/asm/rwsem.h
@@ -63,7 +63,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	return res >= 0 ? 1 : 0;
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
+static inline long ___down_write(struct rw_semaphore *sem)
 {
 	long oldcount;
 #ifndef	CONFIG_SMP
@@ -83,10 +83,24 @@ static inline void __down_write(struct rw_semaphore *sem)
 	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
 	:"Ir" (RWSEM_ACTIVE_WRITE_BIAS), "m" (sem->count) : "memory");
 #endif
-	if (unlikely(oldcount))
+	return oldcount;
+}
+
+static inline void __down_write(struct rw_semaphore *sem)
+{
+	if (unlikely(___down_write(sem)))
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (unlikely(___down_write(sem)))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 05/12] ia64, rwsem: provide __down_write_killable
  2016-02-02 20:19 ` Michal Hocko
@ 2016-02-02 20:19   ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce ___down_write for the fast path and reuse it for __down_write
resp. __down_write_killable each using the respective generic slow path
(rwsem_down_write_failed resp. rwsem_down_write_failed_killable).

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/ia64/include/asm/rwsem.h | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/ia64/include/asm/rwsem.h b/arch/ia64/include/asm/rwsem.h
index 3027e7516d85..5e78cb40d9df 100644
--- a/arch/ia64/include/asm/rwsem.h
+++ b/arch/ia64/include/asm/rwsem.h
@@ -49,8 +49,8 @@ __down_read (struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void
-__down_write (struct rw_semaphore *sem)
+static inline long
+___down_write (struct rw_semaphore *sem)
 {
 	long old, new;
 
@@ -59,10 +59,26 @@ __down_write (struct rw_semaphore *sem)
 		new = old + RWSEM_ACTIVE_WRITE_BIAS;
 	} while (cmpxchg_acq(&sem->count, old, new) != old);
 
-	if (old != 0)
+	return old;
+}
+
+static inline void
+__down_write (struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
 		rwsem_down_write_failed(sem);
 }
 
+static inline int
+__down_write_killable (struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 /*
  * unlock after reading
  */
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 05/12] ia64, rwsem: provide __down_write_killable
@ 2016-02-02 20:19   ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce ___down_write for the fast path and reuse it for __down_write
resp. __down_write_killable each using the respective generic slow path
(rwsem_down_write_failed resp. rwsem_down_write_failed_killable).

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/ia64/include/asm/rwsem.h | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/ia64/include/asm/rwsem.h b/arch/ia64/include/asm/rwsem.h
index 3027e7516d85..5e78cb40d9df 100644
--- a/arch/ia64/include/asm/rwsem.h
+++ b/arch/ia64/include/asm/rwsem.h
@@ -49,8 +49,8 @@ __down_read (struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void
-__down_write (struct rw_semaphore *sem)
+static inline long
+___down_write (struct rw_semaphore *sem)
 {
 	long old, new;
 
@@ -59,10 +59,26 @@ __down_write (struct rw_semaphore *sem)
 		new = old + RWSEM_ACTIVE_WRITE_BIAS;
 	} while (cmpxchg_acq(&sem->count, old, new) != old);
 
-	if (old != 0)
+	return old;
+}
+
+static inline void
+__down_write (struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
 		rwsem_down_write_failed(sem);
 }
 
+static inline int
+__down_write_killable (struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 /*
  * unlock after reading
  */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 06/12] s390, rwsem: provide __down_write_killable
  2016-02-02 20:19 ` Michal Hocko
@ 2016-02-02 20:19   ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce ___down_write for the fast path and reuse it for __down_write
resp. __down_write_killable each using the respective generic slow path
(rwsem_down_write_failed resp. rwsem_down_write_failed_killable).

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/include/asm/rwsem.h | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
index 8e52e72f3efc..8e7b2b7e10f3 100644
--- a/arch/s390/include/asm/rwsem.h
+++ b/arch/s390/include/asm/rwsem.h
@@ -90,7 +90,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write(struct rw_semaphore *sem)
+static inline long ___down_write(struct rw_semaphore *sem)
 {
 	signed long old, new, tmp;
 
@@ -104,10 +104,25 @@ static inline void __down_write(struct rw_semaphore *sem)
 		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
 		: "Q" (sem->count), "m" (tmp)
 		: "cc", "memory");
-	if (old != 0)
+
+	return old;
+}
+
+static inline void __down_write(struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 06/12] s390, rwsem: provide __down_write_killable
@ 2016-02-02 20:19   ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Introduce ___down_write for the fast path and reuse it for __down_write
resp. __down_write_killable each using the respective generic slow path
(rwsem_down_write_failed resp. rwsem_down_write_failed_killable).

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/include/asm/rwsem.h | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
index 8e52e72f3efc..8e7b2b7e10f3 100644
--- a/arch/s390/include/asm/rwsem.h
+++ b/arch/s390/include/asm/rwsem.h
@@ -90,7 +90,7 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write(struct rw_semaphore *sem)
+static inline long ___down_write(struct rw_semaphore *sem)
 {
 	signed long old, new, tmp;
 
@@ -104,10 +104,25 @@ static inline void __down_write(struct rw_semaphore *sem)
 		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
 		: "Q" (sem->count), "m" (tmp)
 		: "cc", "memory");
-	if (old != 0)
+
+	return old;
+}
+
+static inline void __down_write(struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (___down_write(sem))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 07/12] sh, rwsem: provide __down_write_killable
  2016-02-02 20:19 ` Michal Hocko
@ 2016-02-02 20:19   ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which is uses the same fast path as __down_write except it falls back to
rwsem_down_write_failed_killable slow path and return -EINTR if killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/sh/include/asm/rwsem.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
index f6c951c7a875..8a457b83d2a5 100644
--- a/arch/sh/include/asm/rwsem.h
+++ b/arch/sh/include/asm/rwsem.h
@@ -54,6 +54,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				(atomic_t *)(&sem->count));
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	int tmp;
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 07/12] sh, rwsem: provide __down_write_killable
@ 2016-02-02 20:19   ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which is uses the same fast path as __down_write except it falls back to
rwsem_down_write_failed_killable slow path and return -EINTR if killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/sh/include/asm/rwsem.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/sh/include/asm/rwsem.h b/arch/sh/include/asm/rwsem.h
index f6c951c7a875..8a457b83d2a5 100644
--- a/arch/sh/include/asm/rwsem.h
+++ b/arch/sh/include/asm/rwsem.h
@@ -54,6 +54,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				(atomic_t *)(&sem->count));
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	int tmp;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 08/12] sparc, rwsem: provide __down_write_killable
  2016-02-02 20:19 ` Michal Hocko
@ 2016-02-02 20:19   ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which is uses the same fast path as __down_write except it falls back to
rwsem_down_write_failed_killable slow path and return -EINTR if killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/sparc/include/asm/rwsem.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/sparc/include/asm/rwsem.h b/arch/sparc/include/asm/rwsem.h
index e5a0d575bc7f..c1e9e32e8b63 100644
--- a/arch/sparc/include/asm/rwsem.h
+++ b/arch/sparc/include/asm/rwsem.h
@@ -55,6 +55,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	long tmp;
+
+	tmp = atomic64_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				  (atomic64_t *)(&sem->count));
+	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 08/12] sparc, rwsem: provide __down_write_killable
@ 2016-02-02 20:19   ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which is uses the same fast path as __down_write except it falls back to
rwsem_down_write_failed_killable slow path and return -EINTR if killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/sparc/include/asm/rwsem.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/sparc/include/asm/rwsem.h b/arch/sparc/include/asm/rwsem.h
index e5a0d575bc7f..c1e9e32e8b63 100644
--- a/arch/sparc/include/asm/rwsem.h
+++ b/arch/sparc/include/asm/rwsem.h
@@ -55,6 +55,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	long tmp;
+
+	tmp = atomic64_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				  (atomic64_t *)(&sem->count));
+	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 09/12] xtensa, rwsem: provide __down_write_killable
  2016-02-02 20:19 ` Michal Hocko
@ 2016-02-02 20:19   ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which is uses the same fast path as __down_write except it falls back to
rwsem_down_write_failed_killable slow path and return -EINTR if killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/xtensa/include/asm/rwsem.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/xtensa/include/asm/rwsem.h b/arch/xtensa/include/asm/rwsem.h
index 593483f6e1ff..6283823b8040 100644
--- a/arch/xtensa/include/asm/rwsem.h
+++ b/arch/xtensa/include/asm/rwsem.h
@@ -59,6 +59,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				(atomic_t *)(&sem->count));
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	int tmp;
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 09/12] xtensa, rwsem: provide __down_write_killable
@ 2016-02-02 20:19   ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which is uses the same fast path as __down_write except it falls back to
rwsem_down_write_failed_killable slow path and return -EINTR if killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/xtensa/include/asm/rwsem.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/xtensa/include/asm/rwsem.h b/arch/xtensa/include/asm/rwsem.h
index 593483f6e1ff..6283823b8040 100644
--- a/arch/xtensa/include/asm/rwsem.h
+++ b/arch/xtensa/include/asm/rwsem.h
@@ -59,6 +59,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				(atomic_t *)(&sem->count));
+	if (tmp != RWSEM_ACTIVE_WRITE_BIAS)
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	int tmp;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 10/12] x86, rwsem: simplify __down_write
  2016-02-02 20:19 ` Michal Hocko
@ 2016-02-02 20:19   ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

x86 implementation of __down_write is using inline asm to optimize the
code flow. This however requires that it has go over an additional hop
for the slow path call_rwsem_down_write_failed which has to
save_common_regs/restore_common_regs to preserve the calling convention.
This, however doesn't add much because the fast path only saves one
register push/pop (rdx) when compared to the generic implementation:

Before:
0000000000000019 <down_write>:
  19:   e8 00 00 00 00          callq  1e <down_write+0x5>
  1e:   55                      push   %rbp
  1f:   48 ba 01 00 00 00 ff    movabs $0xffffffff00000001,%rdx
  26:   ff ff ff
  29:   48 89 f8                mov    %rdi,%rax
  2c:   48 89 e5                mov    %rsp,%rbp
  2f:   f0 48 0f c1 10          lock xadd %rdx,(%rax)
  34:   85 d2                   test   %edx,%edx
  36:   74 05                   je     3d <down_write+0x24>
  38:   e8 00 00 00 00          callq  3d <down_write+0x24>
  3d:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
  44:   00 00
  46:   5d                      pop    %rbp
  47:   48 89 47 38             mov    %rax,0x38(%rdi)
  4b:   c3                      retq

After:
0000000000000019 <down_write>:
  19:   e8 00 00 00 00          callq  1e <down_write+0x5>
  1e:   55                      push   %rbp
  1f:   48 b8 01 00 00 00 ff    movabs $0xffffffff00000001,%rax
  26:   ff ff ff
  29:   48 89 e5                mov    %rsp,%rbp
  2c:   53                      push   %rbx
  2d:   48 89 fb                mov    %rdi,%rbx
  30:   f0 48 0f c1 07          lock xadd %rax,(%rdi)
  35:   48 85 c0                test   %rax,%rax
  38:   74 05                   je     3f <down_write+0x26>
  3a:   e8 00 00 00 00          callq  3f <down_write+0x26>
  3f:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
  46:   00 00
  48:   48 89 43 38             mov    %rax,0x38(%rbx)
  4c:   5b                      pop    %rbx
  4d:   5d                      pop    %rbp
  4e:   c3                      retq

This doesn't seem to justify the code obfuscation and complexity. Use
the generic implementation instead.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/include/asm/rwsem.h | 17 +++++------------
 arch/x86/lib/rwsem.S         |  9 ---------
 2 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index d79a218675bc..1b5e89b3643d 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -102,18 +102,11 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 static inline void __down_write(struct rw_semaphore *sem)
 {
 	long tmp;
-	asm volatile("# beginning down_write\n\t"
-		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
-		     /* adds 0xffff0001, returns the old value */
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t"
-		     /* was the active mask 0 before? */
-		     "  jz        1f\n"
-		     "  call call_rwsem_down_write_failed\n"
-		     "1:\n"
-		     "# ending down_write"
-		     : "+m" (sem->count), "=d" (tmp)
-		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
+
+	tmp = atomic_long_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				     (atomic_long_t *)&sem->count);
+	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+		rwsem_down_write_failed(sem);
 }
 
 /*
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
index 40027db99140..ea5c7c177483 100644
--- a/arch/x86/lib/rwsem.S
+++ b/arch/x86/lib/rwsem.S
@@ -57,7 +57,6 @@
  * is also the input argument to these helpers)
  *
  * The following can clobber %rdx because the asm clobbers it:
- *   call_rwsem_down_write_failed
  *   call_rwsem_wake
  * but %rdi, %rsi, %rcx, %r8-r11 always need saving.
  */
@@ -93,14 +92,6 @@ ENTRY(call_rwsem_down_read_failed)
 	ret
 ENDPROC(call_rwsem_down_read_failed)
 
-ENTRY(call_rwsem_down_write_failed)
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_down_write_failed
-	restore_common_regs
-	ret
-ENDPROC(call_rwsem_down_write_failed)
-
 ENTRY(call_rwsem_wake)
 	/* do nothing if still outstanding active readers */
 	__ASM_HALF_SIZE(dec) %__ASM_HALF_REG(dx)
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 10/12] x86, rwsem: simplify __down_write
@ 2016-02-02 20:19   ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

x86 implementation of __down_write is using inline asm to optimize the
code flow. This however requires that it has go over an additional hop
for the slow path call_rwsem_down_write_failed which has to
save_common_regs/restore_common_regs to preserve the calling convention.
This, however doesn't add much because the fast path only saves one
register push/pop (rdx) when compared to the generic implementation:

Before:
0000000000000019 <down_write>:
  19:   e8 00 00 00 00          callq  1e <down_write+0x5>
  1e:   55                      push   %rbp
  1f:   48 ba 01 00 00 00 ff    movabs $0xffffffff00000001,%rdx
  26:   ff ff ff
  29:   48 89 f8                mov    %rdi,%rax
  2c:   48 89 e5                mov    %rsp,%rbp
  2f:   f0 48 0f c1 10          lock xadd %rdx,(%rax)
  34:   85 d2                   test   %edx,%edx
  36:   74 05                   je     3d <down_write+0x24>
  38:   e8 00 00 00 00          callq  3d <down_write+0x24>
  3d:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
  44:   00 00
  46:   5d                      pop    %rbp
  47:   48 89 47 38             mov    %rax,0x38(%rdi)
  4b:   c3                      retq

After:
0000000000000019 <down_write>:
  19:   e8 00 00 00 00          callq  1e <down_write+0x5>
  1e:   55                      push   %rbp
  1f:   48 b8 01 00 00 00 ff    movabs $0xffffffff00000001,%rax
  26:   ff ff ff
  29:   48 89 e5                mov    %rsp,%rbp
  2c:   53                      push   %rbx
  2d:   48 89 fb                mov    %rdi,%rbx
  30:   f0 48 0f c1 07          lock xadd %rax,(%rdi)
  35:   48 85 c0                test   %rax,%rax
  38:   74 05                   je     3f <down_write+0x26>
  3a:   e8 00 00 00 00          callq  3f <down_write+0x26>
  3f:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
  46:   00 00
  48:   48 89 43 38             mov    %rax,0x38(%rbx)
  4c:   5b                      pop    %rbx
  4d:   5d                      pop    %rbp
  4e:   c3                      retq

This doesn't seem to justify the code obfuscation and complexity. Use
the generic implementation instead.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/include/asm/rwsem.h | 17 +++++------------
 arch/x86/lib/rwsem.S         |  9 ---------
 2 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index d79a218675bc..1b5e89b3643d 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -102,18 +102,11 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 static inline void __down_write(struct rw_semaphore *sem)
 {
 	long tmp;
-	asm volatile("# beginning down_write\n\t"
-		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
-		     /* adds 0xffff0001, returns the old value */
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t"
-		     /* was the active mask 0 before? */
-		     "  jz        1f\n"
-		     "  call call_rwsem_down_write_failed\n"
-		     "1:\n"
-		     "# ending down_write"
-		     : "+m" (sem->count), "=d" (tmp)
-		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
+
+	tmp = atomic_long_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				     (atomic_long_t *)&sem->count);
+	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+		rwsem_down_write_failed(sem);
 }
 
 /*
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
index 40027db99140..ea5c7c177483 100644
--- a/arch/x86/lib/rwsem.S
+++ b/arch/x86/lib/rwsem.S
@@ -57,7 +57,6 @@
  * is also the input argument to these helpers)
  *
  * The following can clobber %rdx because the asm clobbers it:
- *   call_rwsem_down_write_failed
  *   call_rwsem_wake
  * but %rdi, %rsi, %rcx, %r8-r11 always need saving.
  */
@@ -93,14 +92,6 @@ ENTRY(call_rwsem_down_read_failed)
 	ret
 ENDPROC(call_rwsem_down_read_failed)
 
-ENTRY(call_rwsem_down_write_failed)
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_down_write_failed
-	restore_common_regs
-	ret
-ENDPROC(call_rwsem_down_write_failed)
-
 ENTRY(call_rwsem_wake)
 	/* do nothing if still outstanding active readers */
 	__ASM_HALF_SIZE(dec) %__ASM_HALF_REG(dx)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 11/12] x86, rwsem: provide __down_write_killable
  2016-02-02 20:19 ` Michal Hocko
@ 2016-02-02 20:19   ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which is uses the same fast path as __down_write except it falls back to
rwsem_down_write_failed_killable slow path and return -EINTR if killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/include/asm/rwsem.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index 1b5e89b3643d..64899daa51e2 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -109,6 +109,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	long tmp;
+
+	tmp = atomic_long_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				     (atomic_long_t *)&sem->count);
+	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 11/12] x86, rwsem: provide __down_write_killable
@ 2016-02-02 20:19   ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

which is uses the same fast path as __down_write except it falls back to
rwsem_down_write_failed_killable slow path and return -EINTR if killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/include/asm/rwsem.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index 1b5e89b3643d..64899daa51e2 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -109,6 +109,19 @@ static inline void __down_write(struct rw_semaphore *sem)
 		rwsem_down_write_failed(sem);
 }
 
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	long tmp;
+
+	tmp = atomic_long_add_return(RWSEM_ACTIVE_WRITE_BIAS,
+				     (atomic_long_t *)&sem->count);
+	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+
+	return 0;
+}
+
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 12/12] locking, rwsem: provide down_write_killable
  2016-02-02 20:19 ` Michal Hocko
@ 2016-02-02 20:19   ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Now that all the architectures implement the necessary glue code
we can introduce down_write_killable. The only difference wrt. regular
down_write is that the slow path waits in TASK_KILLABLE state and the
interruption by the fatal signal is reported as -EINTR to the caller.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/lockdep.h | 15 +++++++++++++++
 include/linux/rwsem.h   |  1 +
 kernel/locking/rwsem.c  | 19 +++++++++++++++++++
 3 files changed, 35 insertions(+)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index c57e424d914b..f6bd57c0784b 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -446,6 +446,18 @@ do {								\
 	lock_acquired(&(_lock)->dep_map, _RET_IP_);			\
 } while (0)
 
+#define LOCK_CONTENDED_RETURN(_lock, try, lock)			\
+({								\
+	int ____err = 0;					\
+	if (!try(_lock)) {					\
+		lock_contended(&(_lock)->dep_map, _RET_IP_);	\
+		____err = lock(_lock);				\
+	}							\
+	if (!____err)						\
+		lock_acquired(&(_lock)->dep_map, _RET_IP_);	\
+	____err;						\
+})
+
 #else /* CONFIG_LOCK_STAT */
 
 #define lock_contended(lockdep_map, ip) do {} while (0)
@@ -454,6 +466,9 @@ do {								\
 #define LOCK_CONTENDED(_lock, try, lock) \
 	lock(_lock)
 
+#define LOCK_CONTENDED_RETURN(_lock, try, lock) \
+	lock(_lock)
+
 #endif /* CONFIG_LOCK_STAT */
 
 #ifdef CONFIG_LOCKDEP
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 7d7ae029dac5..d1c12d160ace 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -118,6 +118,7 @@ extern int down_read_trylock(struct rw_semaphore *sem);
  * lock for writing
  */
 extern void down_write(struct rw_semaphore *sem);
+extern int __must_check down_write_killable(struct rw_semaphore *sem);
 
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 205be0ce34de..c817216c1615 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -55,6 +55,25 @@ void __sched down_write(struct rw_semaphore *sem)
 EXPORT_SYMBOL(down_write);
 
 /*
+ * lock for writing
+ */
+int __sched down_write_killable(struct rw_semaphore *sem)
+{
+	might_sleep();
+	rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
+
+	if (LOCK_CONTENDED_RETURN(sem, __down_write_trylock, __down_write_killable)) {
+		rwsem_release(&sem->dep_map, 1, _RET_IP_);
+		return -EINTR;
+	}
+
+	rwsem_set_owner(sem);
+	return 0;
+}
+
+EXPORT_SYMBOL(down_write_killable);
+
+/*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
 int down_write_trylock(struct rw_semaphore *sem)
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 12/12] locking, rwsem: provide down_write_killable
@ 2016-02-02 20:19   ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-02 20:19 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Now that all the architectures implement the necessary glue code
we can introduce down_write_killable. The only difference wrt. regular
down_write is that the slow path waits in TASK_KILLABLE state and the
interruption by the fatal signal is reported as -EINTR to the caller.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/lockdep.h | 15 +++++++++++++++
 include/linux/rwsem.h   |  1 +
 kernel/locking/rwsem.c  | 19 +++++++++++++++++++
 3 files changed, 35 insertions(+)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index c57e424d914b..f6bd57c0784b 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -446,6 +446,18 @@ do {								\
 	lock_acquired(&(_lock)->dep_map, _RET_IP_);			\
 } while (0)
 
+#define LOCK_CONTENDED_RETURN(_lock, try, lock)			\
+({								\
+	int ____err = 0;					\
+	if (!try(_lock)) {					\
+		lock_contended(&(_lock)->dep_map, _RET_IP_);	\
+		____err = lock(_lock);				\
+	}							\
+	if (!____err)						\
+		lock_acquired(&(_lock)->dep_map, _RET_IP_);	\
+	____err;						\
+})
+
 #else /* CONFIG_LOCK_STAT */
 
 #define lock_contended(lockdep_map, ip) do {} while (0)
@@ -454,6 +466,9 @@ do {								\
 #define LOCK_CONTENDED(_lock, try, lock) \
 	lock(_lock)
 
+#define LOCK_CONTENDED_RETURN(_lock, try, lock) \
+	lock(_lock)
+
 #endif /* CONFIG_LOCK_STAT */
 
 #ifdef CONFIG_LOCKDEP
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 7d7ae029dac5..d1c12d160ace 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -118,6 +118,7 @@ extern int down_read_trylock(struct rw_semaphore *sem);
  * lock for writing
  */
 extern void down_write(struct rw_semaphore *sem);
+extern int __must_check down_write_killable(struct rw_semaphore *sem);
 
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 205be0ce34de..c817216c1615 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -55,6 +55,25 @@ void __sched down_write(struct rw_semaphore *sem)
 EXPORT_SYMBOL(down_write);
 
 /*
+ * lock for writing
+ */
+int __sched down_write_killable(struct rw_semaphore *sem)
+{
+	might_sleep();
+	rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
+
+	if (LOCK_CONTENDED_RETURN(sem, __down_write_trylock, __down_write_killable)) {
+		rwsem_release(&sem->dep_map, 1, _RET_IP_);
+		return -EINTR;
+	}
+
+	rwsem_set_owner(sem);
+	return 0;
+}
+
+EXPORT_SYMBOL(down_write_killable);
+
+/*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
 int down_write_trylock(struct rw_semaphore *sem)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [RFC 10/12] x86, rwsem: simplify __down_write
  2016-02-02 20:19   ` Michal Hocko
@ 2016-02-03  8:10     ` Ingo Molnar
  -1 siblings, 0 replies; 64+ messages in thread
From: Ingo Molnar @ 2016-02-03  8:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Michal Hocko, Linus Torvalds, Paul E. McKenney, Peter Zijlstra


* Michal Hocko <mhocko@kernel.org> wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> x86 implementation of __down_write is using inline asm to optimize the
> code flow. This however requires that it has go over an additional hop
> for the slow path call_rwsem_down_write_failed which has to
> save_common_regs/restore_common_regs to preserve the calling convention.
> This, however doesn't add much because the fast path only saves one
> register push/pop (rdx) when compared to the generic implementation:
> 
> Before:
> 0000000000000019 <down_write>:
>   19:   e8 00 00 00 00          callq  1e <down_write+0x5>
>   1e:   55                      push   %rbp
>   1f:   48 ba 01 00 00 00 ff    movabs $0xffffffff00000001,%rdx
>   26:   ff ff ff
>   29:   48 89 f8                mov    %rdi,%rax
>   2c:   48 89 e5                mov    %rsp,%rbp
>   2f:   f0 48 0f c1 10          lock xadd %rdx,(%rax)
>   34:   85 d2                   test   %edx,%edx
>   36:   74 05                   je     3d <down_write+0x24>
>   38:   e8 00 00 00 00          callq  3d <down_write+0x24>
>   3d:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
>   44:   00 00
>   46:   5d                      pop    %rbp
>   47:   48 89 47 38             mov    %rax,0x38(%rdi)
>   4b:   c3                      retq
> 
> After:
> 0000000000000019 <down_write>:
>   19:   e8 00 00 00 00          callq  1e <down_write+0x5>
>   1e:   55                      push   %rbp
>   1f:   48 b8 01 00 00 00 ff    movabs $0xffffffff00000001,%rax
>   26:   ff ff ff
>   29:   48 89 e5                mov    %rsp,%rbp
>   2c:   53                      push   %rbx
>   2d:   48 89 fb                mov    %rdi,%rbx
>   30:   f0 48 0f c1 07          lock xadd %rax,(%rdi)
>   35:   48 85 c0                test   %rax,%rax
>   38:   74 05                   je     3f <down_write+0x26>
>   3a:   e8 00 00 00 00          callq  3f <down_write+0x26>
>   3f:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
>   46:   00 00
>   48:   48 89 43 38             mov    %rax,0x38(%rbx)
>   4c:   5b                      pop    %rbx
>   4d:   5d                      pop    %rbp
>   4e:   c3                      retq

I'm not convinced about the removal of this optimization at all.

> This doesn't seem to justify the code obfuscation and complexity. Use
> the generic implementation instead.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/x86/include/asm/rwsem.h | 17 +++++------------
>  arch/x86/lib/rwsem.S         |  9 ---------
>  2 files changed, 5 insertions(+), 21 deletions(-)

Turn the argument around, would we be willing to save two instructions off the 
fast path of a commonly used locking construct, with such a simple optimization:

>  arch/x86/include/asm/rwsem.h | 17 ++++++++++++-----
>  arch/x86/lib/rwsem.S         |  9 +++++++++
>  2 files changed, 21 insertions(+), 5 deletions(-)

?

Yes!

So, if you want to remove the assembly code - can we achieve that without hurting 
the generated fast path, using the compiler?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 10/12] x86, rwsem: simplify __down_write
@ 2016-02-03  8:10     ` Ingo Molnar
  0 siblings, 0 replies; 64+ messages in thread
From: Ingo Molnar @ 2016-02-03  8:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Michal Hocko, Linus Torvalds, Paul E. McKenney, Peter Zijlstra


* Michal Hocko <mhocko@kernel.org> wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> x86 implementation of __down_write is using inline asm to optimize the
> code flow. This however requires that it has go over an additional hop
> for the slow path call_rwsem_down_write_failed which has to
> save_common_regs/restore_common_regs to preserve the calling convention.
> This, however doesn't add much because the fast path only saves one
> register push/pop (rdx) when compared to the generic implementation:
> 
> Before:
> 0000000000000019 <down_write>:
>   19:   e8 00 00 00 00          callq  1e <down_write+0x5>
>   1e:   55                      push   %rbp
>   1f:   48 ba 01 00 00 00 ff    movabs $0xffffffff00000001,%rdx
>   26:   ff ff ff
>   29:   48 89 f8                mov    %rdi,%rax
>   2c:   48 89 e5                mov    %rsp,%rbp
>   2f:   f0 48 0f c1 10          lock xadd %rdx,(%rax)
>   34:   85 d2                   test   %edx,%edx
>   36:   74 05                   je     3d <down_write+0x24>
>   38:   e8 00 00 00 00          callq  3d <down_write+0x24>
>   3d:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
>   44:   00 00
>   46:   5d                      pop    %rbp
>   47:   48 89 47 38             mov    %rax,0x38(%rdi)
>   4b:   c3                      retq
> 
> After:
> 0000000000000019 <down_write>:
>   19:   e8 00 00 00 00          callq  1e <down_write+0x5>
>   1e:   55                      push   %rbp
>   1f:   48 b8 01 00 00 00 ff    movabs $0xffffffff00000001,%rax
>   26:   ff ff ff
>   29:   48 89 e5                mov    %rsp,%rbp
>   2c:   53                      push   %rbx
>   2d:   48 89 fb                mov    %rdi,%rbx
>   30:   f0 48 0f c1 07          lock xadd %rax,(%rdi)
>   35:   48 85 c0                test   %rax,%rax
>   38:   74 05                   je     3f <down_write+0x26>
>   3a:   e8 00 00 00 00          callq  3f <down_write+0x26>
>   3f:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
>   46:   00 00
>   48:   48 89 43 38             mov    %rax,0x38(%rbx)
>   4c:   5b                      pop    %rbx
>   4d:   5d                      pop    %rbp
>   4e:   c3                      retq

I'm not convinced about the removal of this optimization at all.

> This doesn't seem to justify the code obfuscation and complexity. Use
> the generic implementation instead.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/x86/include/asm/rwsem.h | 17 +++++------------
>  arch/x86/lib/rwsem.S         |  9 ---------
>  2 files changed, 5 insertions(+), 21 deletions(-)

Turn the argument around, would we be willing to save two instructions off the 
fast path of a commonly used locking construct, with such a simple optimization:

>  arch/x86/include/asm/rwsem.h | 17 ++++++++++++-----
>  arch/x86/lib/rwsem.S         |  9 +++++++++
>  2 files changed, 21 insertions(+), 5 deletions(-)

?

Yes!

So, if you want to remove the assembly code - can we achieve that without hurting 
the generated fast path, using the compiler?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 07/12] sh, rwsem: provide __down_write_killable
  2016-02-02 20:19   ` Michal Hocko
@ 2016-02-03 11:19     ` Sergei Shtylyov
  -1 siblings, 0 replies; 64+ messages in thread
From: Sergei Shtylyov @ 2016-02-03 11:19 UTC (permalink / raw)
  To: Michal Hocko, LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

Hello.

On 2/2/2016 11:19 PM, Michal Hocko wrote:

> From: Michal Hocko <mhocko@suse.com>
>
> which is uses the same fast path as __down_write except it falls back to

    s/is//. Same typo (?) in the other patches...

> rwsem_down_write_failed_killable slow path and return -EINTR if killed.
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

[...]

MBR, Sergei


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 07/12] sh, rwsem: provide __down_write_killable
@ 2016-02-03 11:19     ` Sergei Shtylyov
  0 siblings, 0 replies; 64+ messages in thread
From: Sergei Shtylyov @ 2016-02-03 11:19 UTC (permalink / raw)
  To: Michal Hocko, LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko

Hello.

On 2/2/2016 11:19 PM, Michal Hocko wrote:

> From: Michal Hocko <mhocko@suse.com>
>
> which is uses the same fast path as __down_write except it falls back to

    s/is//. Same typo (?) in the other patches...

> rwsem_down_write_failed_killable slow path and return -EINTR if killed.
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 10/12] x86, rwsem: simplify __down_write
  2016-02-03  8:10     ` Ingo Molnar
@ 2016-02-03 12:10       ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-03 12:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Paul E. McKenney, Peter Zijlstra

On Wed 03-02-16 09:10:16, Ingo Molnar wrote:
> 
> * Michal Hocko <mhocko@kernel.org> wrote:
> 
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > x86 implementation of __down_write is using inline asm to optimize the
> > code flow. This however requires that it has go over an additional hop
> > for the slow path call_rwsem_down_write_failed which has to
> > save_common_regs/restore_common_regs to preserve the calling convention.
> > This, however doesn't add much because the fast path only saves one
> > register push/pop (rdx) when compared to the generic implementation:
> > 
> > Before:
> > 0000000000000019 <down_write>:
> >   19:   e8 00 00 00 00          callq  1e <down_write+0x5>
> >   1e:   55                      push   %rbp
> >   1f:   48 ba 01 00 00 00 ff    movabs $0xffffffff00000001,%rdx
> >   26:   ff ff ff
> >   29:   48 89 f8                mov    %rdi,%rax
> >   2c:   48 89 e5                mov    %rsp,%rbp
> >   2f:   f0 48 0f c1 10          lock xadd %rdx,(%rax)
> >   34:   85 d2                   test   %edx,%edx
> >   36:   74 05                   je     3d <down_write+0x24>
> >   38:   e8 00 00 00 00          callq  3d <down_write+0x24>
> >   3d:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
> >   44:   00 00
> >   46:   5d                      pop    %rbp
> >   47:   48 89 47 38             mov    %rax,0x38(%rdi)
> >   4b:   c3                      retq
> > 
> > After:
> > 0000000000000019 <down_write>:
> >   19:   e8 00 00 00 00          callq  1e <down_write+0x5>
> >   1e:   55                      push   %rbp
> >   1f:   48 b8 01 00 00 00 ff    movabs $0xffffffff00000001,%rax
> >   26:   ff ff ff
> >   29:   48 89 e5                mov    %rsp,%rbp
> >   2c:   53                      push   %rbx
> >   2d:   48 89 fb                mov    %rdi,%rbx
> >   30:   f0 48 0f c1 07          lock xadd %rax,(%rdi)
> >   35:   48 85 c0                test   %rax,%rax
> >   38:   74 05                   je     3f <down_write+0x26>
> >   3a:   e8 00 00 00 00          callq  3f <down_write+0x26>
> >   3f:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
> >   46:   00 00
> >   48:   48 89 43 38             mov    %rax,0x38(%rbx)
> >   4c:   5b                      pop    %rbx
> >   4d:   5d                      pop    %rbp
> >   4e:   c3                      retq
> 
> I'm not convinced about the removal of this optimization at all.

OK, fair enough. As I've mentioned in the cover letter I do not really
insist on this patch. I just found the current code too ugly to
live without a good reason because down_write is a call so saving one
push/pop seems like really negligible to the call itself. Moreover this
is a write lock which is expected to be heavier. It is the read path
which is expected to be light and contention (slow path) is expected
on the write lock.

That being said, if you really believe that the current code is easier
to maintain then I will not pursue this patch. The rest doesn't really
depend on it. I will just respin the follow up x86 specifi
__down_write_killable to follow the same code convention.

[...]
> So, if you want to remove the assembly code - can we achieve that without hurting 
> the generated fast path, using the compiler?

One way would be to do the same thing as mutex does and do the fast path
as an inline. This could bloat the kernel and require some additional
changes to allow arch specific reimplementations though so I didn't want
to go that path.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 10/12] x86, rwsem: simplify __down_write
@ 2016-02-03 12:10       ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-03 12:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Paul E. McKenney, Peter Zijlstra

On Wed 03-02-16 09:10:16, Ingo Molnar wrote:
> 
> * Michal Hocko <mhocko@kernel.org> wrote:
> 
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > x86 implementation of __down_write is using inline asm to optimize the
> > code flow. This however requires that it has go over an additional hop
> > for the slow path call_rwsem_down_write_failed which has to
> > save_common_regs/restore_common_regs to preserve the calling convention.
> > This, however doesn't add much because the fast path only saves one
> > register push/pop (rdx) when compared to the generic implementation:
> > 
> > Before:
> > 0000000000000019 <down_write>:
> >   19:   e8 00 00 00 00          callq  1e <down_write+0x5>
> >   1e:   55                      push   %rbp
> >   1f:   48 ba 01 00 00 00 ff    movabs $0xffffffff00000001,%rdx
> >   26:   ff ff ff
> >   29:   48 89 f8                mov    %rdi,%rax
> >   2c:   48 89 e5                mov    %rsp,%rbp
> >   2f:   f0 48 0f c1 10          lock xadd %rdx,(%rax)
> >   34:   85 d2                   test   %edx,%edx
> >   36:   74 05                   je     3d <down_write+0x24>
> >   38:   e8 00 00 00 00          callq  3d <down_write+0x24>
> >   3d:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
> >   44:   00 00
> >   46:   5d                      pop    %rbp
> >   47:   48 89 47 38             mov    %rax,0x38(%rdi)
> >   4b:   c3                      retq
> > 
> > After:
> > 0000000000000019 <down_write>:
> >   19:   e8 00 00 00 00          callq  1e <down_write+0x5>
> >   1e:   55                      push   %rbp
> >   1f:   48 b8 01 00 00 00 ff    movabs $0xffffffff00000001,%rax
> >   26:   ff ff ff
> >   29:   48 89 e5                mov    %rsp,%rbp
> >   2c:   53                      push   %rbx
> >   2d:   48 89 fb                mov    %rdi,%rbx
> >   30:   f0 48 0f c1 07          lock xadd %rax,(%rdi)
> >   35:   48 85 c0                test   %rax,%rax
> >   38:   74 05                   je     3f <down_write+0x26>
> >   3a:   e8 00 00 00 00          callq  3f <down_write+0x26>
> >   3f:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
> >   46:   00 00
> >   48:   48 89 43 38             mov    %rax,0x38(%rbx)
> >   4c:   5b                      pop    %rbx
> >   4d:   5d                      pop    %rbp
> >   4e:   c3                      retq
> 
> I'm not convinced about the removal of this optimization at all.

OK, fair enough. As I've mentioned in the cover letter I do not really
insist on this patch. I just found the current code too ugly to
live without a good reason because down_write is a call so saving one
push/pop seems like really negligible to the call itself. Moreover this
is a write lock which is expected to be heavier. It is the read path
which is expected to be light and contention (slow path) is expected
on the write lock.

That being said, if you really believe that the current code is easier
to maintain then I will not pursue this patch. The rest doesn't really
depend on it. I will just respin the follow up x86 specifi
__down_write_killable to follow the same code convention.

[...]
> So, if you want to remove the assembly code - can we achieve that without hurting 
> the generated fast path, using the compiler?

One way would be to do the same thing as mutex does and do the fast path
as an inline. This could bloat the kernel and require some additional
changes to allow arch specific reimplementations though so I didn't want
to go that path.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 07/12] sh, rwsem: provide __down_write_killable
  2016-02-03 11:19     ` Sergei Shtylyov
@ 2016-02-03 12:11       ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-03 12:11 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch

On Wed 03-02-16 14:19:05, Sergei Shtylyov wrote:
> Hello.
> 
> On 2/2/2016 11:19 PM, Michal Hocko wrote:
> 
> >From: Michal Hocko <mhocko@suse.com>
> >
> >which is uses the same fast path as __down_write except it falls back to
> 
>    s/is//. Same typo (?) in the other patches...

Thanks will fix that.

> 
> >rwsem_down_write_failed_killable slow path and return -EINTR if killed.
> >
> >Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> [...]
> 
> MBR, Sergei
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 07/12] sh, rwsem: provide __down_write_killable
@ 2016-02-03 12:11       ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-03 12:11 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch

On Wed 03-02-16 14:19:05, Sergei Shtylyov wrote:
> Hello.
> 
> On 2/2/2016 11:19 PM, Michal Hocko wrote:
> 
> >From: Michal Hocko <mhocko@suse.com>
> >
> >which is uses the same fast path as __down_write except it falls back to
> 
>    s/is//. Same typo (?) in the other patches...

Thanks will fix that.

> 
> >rwsem_down_write_failed_killable slow path and return -EINTR if killed.
> >
> >Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> [...]
> 
> MBR, Sergei
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [RFC 11/12 v1] x86, rwsem: provide __down_write_killable
  2016-02-02 20:19   ` Michal Hocko
  (?)
  (?)
@ 2016-02-17 16:41     ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-17 16:41 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

OK, so I have dropped patch 10 and reworked the x86 part to use the same
asm as __down_write uses. Does this look any better?

I am not an expert in inline asm so the way I am doing it might be not
optimal but at least the gerenated code looks sane (no changes for the
regular __down_write).
---
From d9a24cd6d6eb48602b11df56ecc3ea4e223ac18d Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Mon, 1 Feb 2016 18:21:51 +0100
Subject: [PATCH] x86, rwsem: provide __down_write_killable

which uses the same fast path as __down_write except it falls back to
call_rwsem_down_write_failed_killable slow path and return -EINTR if
killed. To prevent from code duplication extract the skeleton of
__down_write into a helper macro which just takes the semaphore
and the slow path function to be called.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/include/asm/rwsem.h | 41 ++++++++++++++++++++++++++++-------------
 arch/x86/lib/rwsem.S         |  8 ++++++++
 2 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index d79a218675bc..4c3d90dbe89a 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -99,21 +99,36 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
+#define ____down_write(sem, slow_path)			\
+({							\
+	long tmp;					\
+	struct rw_semaphore* ret = sem;			\
+	asm volatile("# beginning down_write\n\t"	\
+		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"	\
+		     /* adds 0xffff0001, returns the old value */ \
+		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
+		     /* was the active mask 0 before? */\
+		     "  jz        1f\n"			\
+		     "  call " slow_path "\n"		\
+		     "1:\n"				\
+		     "# ending down_write"		\
+		     : "+m" (sem->count), "=d" (tmp), "+a" (ret)	\
+		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
+		     : "memory", "cc");			\
+	ret;						\
+})
+
 static inline void __down_write(struct rw_semaphore *sem)
 {
-	long tmp;
-	asm volatile("# beginning down_write\n\t"
-		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
-		     /* adds 0xffff0001, returns the old value */
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t"
-		     /* was the active mask 0 before? */
-		     "  jz        1f\n"
-		     "  call call_rwsem_down_write_failed\n"
-		     "1:\n"
-		     "# ending down_write"
-		     : "+m" (sem->count), "=d" (tmp)
-		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
+	____down_write(sem, "call_rwsem_down_write_failed");
+}
+
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (IS_ERR(____down_write(sem, "call_rwsem_down_write_failed_killable")))
+		return -EINTR;
+
+	return 0;
 }
 
 /*
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
index 40027db99140..d1a1397e1fb3 100644
--- a/arch/x86/lib/rwsem.S
+++ b/arch/x86/lib/rwsem.S
@@ -101,6 +101,14 @@ ENTRY(call_rwsem_down_write_failed)
 	ret
 ENDPROC(call_rwsem_down_write_failed)
 
+ENTRY(call_rwsem_down_write_failed_killable)
+	save_common_regs
+	movq %rax,%rdi
+	call rwsem_down_write_failed_killable
+	restore_common_regs
+	ret
+ENDPROC(call_rwsem_down_write_failed_killable)
+
 ENTRY(call_rwsem_wake)
 	/* do nothing if still outstanding active readers */
 	__ASM_HALF_SIZE(dec) %__ASM_HALF_REG(dx)
-- 
2.7.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 11/12 v1] x86, rwsem: provide __down_write_killable
@ 2016-02-17 16:41     ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-17 16:41 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

OK, so I have dropped patch 10 and reworked the x86 part to use the same
asm as __down_write uses. Does this look any better?

I am not an expert in inline asm so the way I am doing it might be not
optimal but at least the gerenated code looks sane (no changes for the
regular __down_write).
---
>From d9a24cd6d6eb48602b11df56ecc3ea4e223ac18d Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Mon, 1 Feb 2016 18:21:51 +0100
Subject: [PATCH] x86, rwsem: provide __down_write_killable

which uses the same fast path as __down_write except it falls back to
call_rwsem_down_write_failed_killable slow path and return -EINTR if
killed. To prevent from code duplication extract the skeleton of
__down_write into a helper macro which just takes the semaphore
and the slow path function to be called.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/include/asm/rwsem.h | 41 ++++++++++++++++++++++++++++-------------
 arch/x86/lib/rwsem.S         |  8 ++++++++
 2 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index d79a218675bc..4c3d90dbe89a 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -99,21 +99,36 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
+#define ____down_write(sem, slow_path)			\
+({							\
+	long tmp;					\
+	struct rw_semaphore* ret = sem;			\
+	asm volatile("# beginning down_write\n\t"	\
+		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"	\
+		     /* adds 0xffff0001, returns the old value */ \
+		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
+		     /* was the active mask 0 before? */\
+		     "  jz        1f\n"			\
+		     "  call " slow_path "\n"		\
+		     "1:\n"				\
+		     "# ending down_write"		\
+		     : "+m" (sem->count), "=d" (tmp), "+a" (ret)	\
+		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
+		     : "memory", "cc");			\
+	ret;						\
+})
+
 static inline void __down_write(struct rw_semaphore *sem)
 {
-	long tmp;
-	asm volatile("# beginning down_write\n\t"
-		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
-		     /* adds 0xffff0001, returns the old value */
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t"
-		     /* was the active mask 0 before? */
-		     "  jz        1f\n"
-		     "  call call_rwsem_down_write_failed\n"
-		     "1:\n"
-		     "# ending down_write"
-		     : "+m" (sem->count), "=d" (tmp)
-		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
+	____down_write(sem, "call_rwsem_down_write_failed");
+}
+
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (IS_ERR(____down_write(sem, "call_rwsem_down_write_failed_killable")))
+		return -EINTR;
+
+	return 0;
 }
 
 /*
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
index 40027db99140..d1a1397e1fb3 100644
--- a/arch/x86/lib/rwsem.S
+++ b/arch/x86/lib/rwsem.S
@@ -101,6 +101,14 @@ ENTRY(call_rwsem_down_write_failed)
 	ret
 ENDPROC(call_rwsem_down_write_failed)
 
+ENTRY(call_rwsem_down_write_failed_killable)
+	save_common_regs
+	movq %rax,%rdi
+	call rwsem_down_write_failed_killable
+	restore_common_regs
+	ret
+ENDPROC(call_rwsem_down_write_failed_killable)
+
 ENTRY(call_rwsem_wake)
 	/* do nothing if still outstanding active readers */
 	__ASM_HALF_SIZE(dec) %__ASM_HALF_REG(dx)
-- 
2.7.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 11/12 v1] x86, rwsem: provide __down_write_killable
@ 2016-02-17 16:41     ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-17 16:41 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

OK, so I have dropped patch 10 and reworked the x86 part to use the same
asm as __down_write uses. Does this look any better?

I am not an expert in inline asm so the way I am doing it might be not
optimal but at least the gerenated code looks sane (no changes for the
regular __down_write).
---
From d9a24cd6d6eb48602b11df56ecc3ea4e223ac18d Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Mon, 1 Feb 2016 18:21:51 +0100
Subject: [PATCH] x86, rwsem: provide __down_write_killable

which uses the same fast path as __down_write except it falls back to
call_rwsem_down_write_failed_killable slow path and return -EINTR if
killed. To prevent from code duplication extract the skeleton of
__down_write into a helper macro which just takes the semaphore
and the slow path function to be called.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/include/asm/rwsem.h | 41 ++++++++++++++++++++++++++++-------------
 arch/x86/lib/rwsem.S         |  8 ++++++++
 2 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index d79a218675bc..4c3d90dbe89a 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -99,21 +99,36 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
+#define ____down_write(sem, slow_path)			\
+({							\
+	long tmp;					\
+	struct rw_semaphore* ret = sem;			\
+	asm volatile("# beginning down_write\n\t"	\
+		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"	\
+		     /* adds 0xffff0001, returns the old value */ \
+		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
+		     /* was the active mask 0 before? */\
+		     "  jz        1f\n"			\
+		     "  call " slow_path "\n"		\
+		     "1:\n"				\
+		     "# ending down_write"		\
+		     : "+m" (sem->count), "=d" (tmp), "+a" (ret)	\
+		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
+		     : "memory", "cc");			\
+	ret;						\
+})
+
 static inline void __down_write(struct rw_semaphore *sem)
 {
-	long tmp;
-	asm volatile("# beginning down_write\n\t"
-		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
-		     /* adds 0xffff0001, returns the old value */
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t"
-		     /* was the active mask 0 before? */
-		     "  jz        1f\n"
-		     "  call call_rwsem_down_write_failed\n"
-		     "1:\n"
-		     "# ending down_write"
-		     : "+m" (sem->count), "=d" (tmp)
-		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
+	____down_write(sem, "call_rwsem_down_write_failed");
+}
+
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (IS_ERR(____down_write(sem, "call_rwsem_down_write_failed_killable")))
+		return -EINTR;
+
+	return 0;
 }
 
 /*
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
index 40027db99140..d1a1397e1fb3 100644
--- a/arch/x86/lib/rwsem.S
+++ b/arch/x86/lib/rwsem.S
@@ -101,6 +101,14 @@ ENTRY(call_rwsem_down_write_failed)
 	ret
 ENDPROC(call_rwsem_down_write_failed)
 
+ENTRY(call_rwsem_down_write_failed_killable)
+	save_common_regs
+	movq %rax,%rdi
+	call rwsem_down_write_failed_killable
+	restore_common_regs
+	ret
+ENDPROC(call_rwsem_down_write_failed_killable)
+
 ENTRY(call_rwsem_wake)
 	/* do nothing if still outstanding active readers */
 	__ASM_HALF_SIZE(dec) %__ASM_HALF_REG(dx)
-- 
2.7.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC 11/12 v1] x86, rwsem: provide __down_write_killable
@ 2016-02-17 16:41     ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-17 16:41 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

OK, so I have dropped patch 10 and reworked the x86 part to use the same
asm as __down_write uses. Does this look any better?

I am not an expert in inline asm so the way I am doing it might be not
optimal but at least the gerenated code looks sane (no changes for the
regular __down_write).
---

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
  2016-02-02 20:19 ` Michal Hocko
@ 2016-02-19 12:15   ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-19 12:15 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

Are there any fundamenta lobjections to the patchset? I plan to resubmit
next week with the changes from the feedback along with the mmap_sem
down_write_killable usage.

On Tue 02-02-16 21:19:17, Michal Hocko wrote:
> Hi,
> the following patchset implements a killable variant of write lock for
> rw_semaphore. My usecase is to turn as many mmap_sem write users to use
> a killable variant which will be helpful for the oom_reaper [1] to
> asynchronously tear down the oom victim address space which requires
> mmap_sem for read. This will reduce a likelihood of OOM livelocks caused
> by oom victim being stuck on a lock or other resource which prevents it
> to reach its exit path and release the memory. I haven't implemented
> the killable variant of the read lock because I do not have any usecase
> for this API.
> 
> The patchset is organized as follows.
> - Patch 1 is a trivial cleanup
> - Patch 2, I belive, shouldn't introduce any functional changes as per
>   Documentation/memory-barriers.txt.
> - Patch 3 is the preparatory work and necessary infrastructure for
>   down_write_killable. It implements generic __down_write_killable
>   and prepares the write lock slow path to bail out earlier when told so
> - Patch 4-9 are implementing arch specific __down_write_killable. One
>   patch per architecture. I haven't even tried to compile test anything but
>   sparch which uses CONFIG_RWSEM_GENERIC_SPINLOCK in allnoconfig.
>   Those shold be mostly trivial.
> - One exception is x86 which replaces the current implementation of
>   __down_write with the generic one to make easier to read and get rid
>   of one level of indirection to the slow path. More on that in patch 10.
>   I do not have any problems to drop patch 10 and rework 11 to the current
>   inline asm but I think the easier code would be better.
> - finally patch 11 implements down_write_killable and ties everything
>   together. I am not really an expert on lockdep so I hope I got it right.
> 
> Many of arch specific patches are basically same and I can squash them
> into one patch if this is preferred but I thought that one patch per
> arch is preferable.
> 
> My patch to change mmap_sem write users to killable form is not part
> of the series because it is not finished yet but I guess it is not
> really necessary for the RFC. The API is used in the same way as
> mutex_lock_killable.
> 
> I have tested on x86 with OOM situations with high mmap_sem contention
> (basically many parallel page faults racing with many parallel mmap/munmap
> tight loops) so the waiters for the write locks are routinely interrupted
> by SIGKILL.
> 
> Patches should apply cleanly on both Linus and next tree.
> 
> Any feedback is highly appreciated.
> ---
> [1] http://lkml.kernel.org/r/1452094975-551-1-git-send-email-mhocko@kernel.org

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
@ 2016-02-19 12:15   ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-02-19 12:15 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch

Are there any fundamenta lobjections to the patchset? I plan to resubmit
next week with the changes from the feedback along with the mmap_sem
down_write_killable usage.

On Tue 02-02-16 21:19:17, Michal Hocko wrote:
> Hi,
> the following patchset implements a killable variant of write lock for
> rw_semaphore. My usecase is to turn as many mmap_sem write users to use
> a killable variant which will be helpful for the oom_reaper [1] to
> asynchronously tear down the oom victim address space which requires
> mmap_sem for read. This will reduce a likelihood of OOM livelocks caused
> by oom victim being stuck on a lock or other resource which prevents it
> to reach its exit path and release the memory. I haven't implemented
> the killable variant of the read lock because I do not have any usecase
> for this API.
> 
> The patchset is organized as follows.
> - Patch 1 is a trivial cleanup
> - Patch 2, I belive, shouldn't introduce any functional changes as per
>   Documentation/memory-barriers.txt.
> - Patch 3 is the preparatory work and necessary infrastructure for
>   down_write_killable. It implements generic __down_write_killable
>   and prepares the write lock slow path to bail out earlier when told so
> - Patch 4-9 are implementing arch specific __down_write_killable. One
>   patch per architecture. I haven't even tried to compile test anything but
>   sparch which uses CONFIG_RWSEM_GENERIC_SPINLOCK in allnoconfig.
>   Those shold be mostly trivial.
> - One exception is x86 which replaces the current implementation of
>   __down_write with the generic one to make easier to read and get rid
>   of one level of indirection to the slow path. More on that in patch 10.
>   I do not have any problems to drop patch 10 and rework 11 to the current
>   inline asm but I think the easier code would be better.
> - finally patch 11 implements down_write_killable and ties everything
>   together. I am not really an expert on lockdep so I hope I got it right.
> 
> Many of arch specific patches are basically same and I can squash them
> into one patch if this is preferred but I thought that one patch per
> arch is preferable.
> 
> My patch to change mmap_sem write users to killable form is not part
> of the series because it is not finished yet but I guess it is not
> really necessary for the RFC. The API is used in the same way as
> mutex_lock_killable.
> 
> I have tested on x86 with OOM situations with high mmap_sem contention
> (basically many parallel page faults racing with many parallel mmap/munmap
> tight loops) so the waiters for the write locks are routinely interrupted
> by SIGKILL.
> 
> Patches should apply cleanly on both Linus and next tree.
> 
> Any feedback is highly appreciated.
> ---
> [1] http://lkml.kernel.org/r/1452094975-551-1-git-send-email-mhocko@kernel.org

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
  2016-02-02 20:19 ` Michal Hocko
@ 2016-03-09 12:18   ` Ingo Molnar
  -1 siblings, 0 replies; 64+ messages in thread
From: Ingo Molnar @ 2016-03-09 12:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Peter Zijlstra


* Michal Hocko <mhocko@kernel.org> wrote:

> Hi,
>
> the following patchset implements a killable variant of write lock for 
> rw_semaphore. My usecase is to turn as many mmap_sem write users to use a 
> killable variant which will be helpful for the oom_reaper [1] to asynchronously 
> tear down the oom victim address space which requires mmap_sem for read. This 
> will reduce a likelihood of OOM livelocks caused by oom victim being stuck on a 
> lock or other resource which prevents it to reach its exit path and release the 
> memory. [...]

So I'm a tiny bit concerned about this arguments.

AFAICS killability here just makes existing system calls more interruptible - 
right? In that sense that's not really a livelock scenario: it just takes shorter 
time for resources to be released.

If a livelock is possible (where resources are never released) then I'd like to 
see a specific example of such a livelock.

You have the other patch-set:

   [PATCH 0/18] change mmap_sem taken for write killable

that makes use of down_write_killable(), and there you argue:

 [...] this is a follow up work for oom_reaper [1]. As the async OOM killing 
 depends on oom_sem for read we would really appreciate if a holder for write 
 stood in the way. This patchset is changing many of down_write calls to be 
 killable to help those cases when the writer is blocked and waiting for readers 
 to release the lock and so help __oom_reap_task to process the oom victim.

there seems to be a misunderstanding: if a writer is blocked waiting for readers 
then no new readers are allowed - the writer will get its turn the moment all 
existing readers drop the lock.

So there's no livelock scenario - it's "only" about latencies.

And once we realize that it's about latencies (assuming I'm right!), not about 
correctness per se, I'm wondering whether it would be a good idea to introduce 
down_write_interruptible(), instead of down_write_killable().

I'd love various processes to quit faster on Ctrl-C as well, not just on kill -9!

This would also test the new code paths a lot better: kill -9 is a lot rarer than 
regular interruption.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
@ 2016-03-09 12:18   ` Ingo Molnar
  0 siblings, 0 replies; 64+ messages in thread
From: Ingo Molnar @ 2016-03-09 12:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Peter Zijlstra


* Michal Hocko <mhocko@kernel.org> wrote:

> Hi,
>
> the following patchset implements a killable variant of write lock for 
> rw_semaphore. My usecase is to turn as many mmap_sem write users to use a 
> killable variant which will be helpful for the oom_reaper [1] to asynchronously 
> tear down the oom victim address space which requires mmap_sem for read. This 
> will reduce a likelihood of OOM livelocks caused by oom victim being stuck on a 
> lock or other resource which prevents it to reach its exit path and release the 
> memory. [...]

So I'm a tiny bit concerned about this arguments.

AFAICS killability here just makes existing system calls more interruptible - 
right? In that sense that's not really a livelock scenario: it just takes shorter 
time for resources to be released.

If a livelock is possible (where resources are never released) then I'd like to 
see a specific example of such a livelock.

You have the other patch-set:

   [PATCH 0/18] change mmap_sem taken for write killable

that makes use of down_write_killable(), and there you argue:

 [...] this is a follow up work for oom_reaper [1]. As the async OOM killing 
 depends on oom_sem for read we would really appreciate if a holder for write 
 stood in the way. This patchset is changing many of down_write calls to be 
 killable to help those cases when the writer is blocked and waiting for readers 
 to release the lock and so help __oom_reap_task to process the oom victim.

there seems to be a misunderstanding: if a writer is blocked waiting for readers 
then no new readers are allowed - the writer will get its turn the moment all 
existing readers drop the lock.

So there's no livelock scenario - it's "only" about latencies.

And once we realize that it's about latencies (assuming I'm right!), not about 
correctness per se, I'm wondering whether it would be a good idea to introduce 
down_write_interruptible(), instead of down_write_killable().

I'd love various processes to quit faster on Ctrl-C as well, not just on kill -9!

This would also test the new code paths a lot better: kill -9 is a lot rarer than 
regular interruption.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
  2016-03-09 12:18   ` Ingo Molnar
@ 2016-03-09 12:56     ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-03-09 12:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Peter Zijlstra

On Wed 09-03-16 13:18:50, Ingo Molnar wrote:
> 
> * Michal Hocko <mhocko@kernel.org> wrote:
> 
> > Hi,
> >
> > the following patchset implements a killable variant of write lock for 
> > rw_semaphore. My usecase is to turn as many mmap_sem write users to use a 
> > killable variant which will be helpful for the oom_reaper [1] to asynchronously 
> > tear down the oom victim address space which requires mmap_sem for read. This 
> > will reduce a likelihood of OOM livelocks caused by oom victim being stuck on a 
> > lock or other resource which prevents it to reach its exit path and release the 
> > memory. [...]
> 
> So I'm a tiny bit concerned about this arguments.
> 
> AFAICS killability here just makes existing system calls more interruptible - 
> right?

see below

> In that sense that's not really a livelock scenario: it just takes shorter 
> time for resources to be released.
> 
> If a livelock is possible (where resources are never released) then I'd like to 
> see a specific example of such a livelock.
> 
> You have the other patch-set:
> 
>    [PATCH 0/18] change mmap_sem taken for write killable
> 
> that makes use of down_write_killable(), and there you argue:
> 
>  [...] this is a follow up work for oom_reaper [1]. As the async OOM killing 
>  depends on oom_sem for read we would really appreciate if a holder for write 
>  stood in the way. This patchset is changing many of down_write calls to be 
>  killable to help those cases when the writer is blocked and waiting for readers 
>  to release the lock and so help __oom_reap_task to process the oom victim.
> 
> there seems to be a misunderstanding: if a writer is blocked waiting for readers 
> then no new readers are allowed - the writer will get its turn the moment all 
> existing readers drop the lock.

Readers might be blocked e.g. on the memory allocation which cannot
proceed due to OOM. Such a reader might be operating on a remote mm.

> So there's no livelock scenario - it's "only" about latencies.

Latency is certainly one aspect of it as well because the sooner the
mmap_sem gets released for other readers to sooner the oom_reaper can
tear down the victims address space and release the memory and free up
some memory so that we do not have to wait for the victim to exit.

> And once we realize that it's about latencies (assuming I'm right!), not about 
> correctness per se, I'm wondering whether it would be a good idea to introduce 
> down_write_interruptible(), instead of down_write_killable().

I am not against interruptible variant as well but I suspect that some
paths are not expected to return EINTR. I haven't checked them for this
but killable is sufficient for the problem I am trying to solve. That
problem is real while latencies do not seem to be that eminent.

down_write_interruptible will be trivial to do on top.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
@ 2016-03-09 12:56     ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-03-09 12:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Peter Zijlstra

On Wed 09-03-16 13:18:50, Ingo Molnar wrote:
> 
> * Michal Hocko <mhocko@kernel.org> wrote:
> 
> > Hi,
> >
> > the following patchset implements a killable variant of write lock for 
> > rw_semaphore. My usecase is to turn as many mmap_sem write users to use a 
> > killable variant which will be helpful for the oom_reaper [1] to asynchronously 
> > tear down the oom victim address space which requires mmap_sem for read. This 
> > will reduce a likelihood of OOM livelocks caused by oom victim being stuck on a 
> > lock or other resource which prevents it to reach its exit path and release the 
> > memory. [...]
> 
> So I'm a tiny bit concerned about this arguments.
> 
> AFAICS killability here just makes existing system calls more interruptible - 
> right?

see below

> In that sense that's not really a livelock scenario: it just takes shorter 
> time for resources to be released.
> 
> If a livelock is possible (where resources are never released) then I'd like to 
> see a specific example of such a livelock.
> 
> You have the other patch-set:
> 
>    [PATCH 0/18] change mmap_sem taken for write killable
> 
> that makes use of down_write_killable(), and there you argue:
> 
>  [...] this is a follow up work for oom_reaper [1]. As the async OOM killing 
>  depends on oom_sem for read we would really appreciate if a holder for write 
>  stood in the way. This patchset is changing many of down_write calls to be 
>  killable to help those cases when the writer is blocked and waiting for readers 
>  to release the lock and so help __oom_reap_task to process the oom victim.
> 
> there seems to be a misunderstanding: if a writer is blocked waiting for readers 
> then no new readers are allowed - the writer will get its turn the moment all 
> existing readers drop the lock.

Readers might be blocked e.g. on the memory allocation which cannot
proceed due to OOM. Such a reader might be operating on a remote mm.

> So there's no livelock scenario - it's "only" about latencies.

Latency is certainly one aspect of it as well because the sooner the
mmap_sem gets released for other readers to sooner the oom_reaper can
tear down the victims address space and release the memory and free up
some memory so that we do not have to wait for the victim to exit.

> And once we realize that it's about latencies (assuming I'm right!), not about 
> correctness per se, I'm wondering whether it would be a good idea to introduce 
> down_write_interruptible(), instead of down_write_killable().

I am not against interruptible variant as well but I suspect that some
paths are not expected to return EINTR. I haven't checked them for this
but killable is sufficient for the problem I am trying to solve. That
problem is real while latencies do not seem to be that eminent.

down_write_interruptible will be trivial to do on top.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
  2016-03-09 12:56     ` Michal Hocko
@ 2016-03-09 13:17       ` Ingo Molnar
  -1 siblings, 0 replies; 64+ messages in thread
From: Ingo Molnar @ 2016-03-09 13:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Peter Zijlstra


* Michal Hocko <mhocko@kernel.org> wrote:

> >  [...] this is a follow up work for oom_reaper [1]. As the async OOM killing 
> >  depends on oom_sem for read we would really appreciate if a holder for write 
> >  stood in the way. This patchset is changing many of down_write calls to be 
> >  killable to help those cases when the writer is blocked and waiting for 
> >  readers to release the lock and so help __oom_reap_task to process the oom 
> >  victim.
> > 
> > there seems to be a misunderstanding: if a writer is blocked waiting for 
> > readers then no new readers are allowed - the writer will get its turn the 
> > moment all existing readers drop the lock.
> 
> Readers might be blocked e.g. on the memory allocation which cannot proceed due 
> to OOM. Such a reader might be operating on a remote mm.

Doing complex allocations with the mm locked looks fragile no matter what: we 
should add debugging code that warns if allocations are done with a remote mm 
locked. (it should be trivial)

In fact people were thining about turning the mm semaphore into a rwlock - with 
that no blocking call should be possible with the lock held.

So I maintain:

> > So there's no livelock scenario - it's "only" about latencies.

With a qualification: s/only/mostly ;-)

> Latency is certainly one aspect of it as well because the sooner the mmap_sem 
> gets released for other readers to sooner the oom_reaper can tear down the 
> victims address space and release the memory and free up some memory so that we 
> do not have to wait for the victim to exit.
> 
> > And once we realize that it's about latencies (assuming I'm right!), not about 
> > correctness per se, I'm wondering whether it would be a good idea to introduce 
> > down_write_interruptible(), instead of down_write_killable().
> 
> I am not against interruptible variant as well but I suspect that some paths are 
> not expected to return EINTR. I haven't checked them for this but killable is 
> sufficient for the problem I am trying to solve. That problem is real while 
> latencies do not seem to be that eminent.

If they don't expect EINTR then they sure don't expect SIGKILL either!

There's a (very) low number of system calls that are not interruptible, but the 
vast majority is.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
@ 2016-03-09 13:17       ` Ingo Molnar
  0 siblings, 0 replies; 64+ messages in thread
From: Ingo Molnar @ 2016-03-09 13:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Peter Zijlstra


* Michal Hocko <mhocko@kernel.org> wrote:

> >  [...] this is a follow up work for oom_reaper [1]. As the async OOM killing 
> >  depends on oom_sem for read we would really appreciate if a holder for write 
> >  stood in the way. This patchset is changing many of down_write calls to be 
> >  killable to help those cases when the writer is blocked and waiting for 
> >  readers to release the lock and so help __oom_reap_task to process the oom 
> >  victim.
> > 
> > there seems to be a misunderstanding: if a writer is blocked waiting for 
> > readers then no new readers are allowed - the writer will get its turn the 
> > moment all existing readers drop the lock.
> 
> Readers might be blocked e.g. on the memory allocation which cannot proceed due 
> to OOM. Such a reader might be operating on a remote mm.

Doing complex allocations with the mm locked looks fragile no matter what: we 
should add debugging code that warns if allocations are done with a remote mm 
locked. (it should be trivial)

In fact people were thining about turning the mm semaphore into a rwlock - with 
that no blocking call should be possible with the lock held.

So I maintain:

> > So there's no livelock scenario - it's "only" about latencies.

With a qualification: s/only/mostly ;-)

> Latency is certainly one aspect of it as well because the sooner the mmap_sem 
> gets released for other readers to sooner the oom_reaper can tear down the 
> victims address space and release the memory and free up some memory so that we 
> do not have to wait for the victim to exit.
> 
> > And once we realize that it's about latencies (assuming I'm right!), not about 
> > correctness per se, I'm wondering whether it would be a good idea to introduce 
> > down_write_interruptible(), instead of down_write_killable().
> 
> I am not against interruptible variant as well but I suspect that some paths are 
> not expected to return EINTR. I haven't checked them for this but killable is 
> sufficient for the problem I am trying to solve. That problem is real while 
> latencies do not seem to be that eminent.

If they don't expect EINTR then they sure don't expect SIGKILL either!

There's a (very) low number of system calls that are not interruptible, but the 
vast majority is.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
  2016-03-09 13:17       ` Ingo Molnar
@ 2016-03-09 13:28         ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-03-09 13:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Peter Zijlstra

On Wed 09-03-16 14:17:10, Ingo Molnar wrote:
> 
> * Michal Hocko <mhocko@kernel.org> wrote:
> 
> > >  [...] this is a follow up work for oom_reaper [1]. As the async OOM killing 
> > >  depends on oom_sem for read we would really appreciate if a holder for write 
> > >  stood in the way. This patchset is changing many of down_write calls to be 
> > >  killable to help those cases when the writer is blocked and waiting for 
> > >  readers to release the lock and so help __oom_reap_task to process the oom 
> > >  victim.
> > > 
> > > there seems to be a misunderstanding: if a writer is blocked waiting for 
> > > readers then no new readers are allowed - the writer will get its turn the 
> > > moment all existing readers drop the lock.
> > 
> > Readers might be blocked e.g. on the memory allocation which cannot proceed due 
> > to OOM. Such a reader might be operating on a remote mm.
> 
> Doing complex allocations with the mm locked looks fragile no matter what: we 
> should add debugging code that warns if allocations are done with a remote mm 
> locked. (it should be trivial)

No matter how fragile is that it is not something non-existent. Just
have a look at use_mm for example. We definitely do not want to warn
about those, right?

> In fact people were thining about turning the mm semaphore into a rwlock - with 
> that no blocking call should be possible with the lock held.
> 
> So I maintain:
> 
> > > So there's no livelock scenario - it's "only" about latencies.
> 
> With a qualification: s/only/mostly ;-)
> 
> > Latency is certainly one aspect of it as well because the sooner the mmap_sem 
> > gets released for other readers to sooner the oom_reaper can tear down the 
> > victims address space and release the memory and free up some memory so that we 
> > do not have to wait for the victim to exit.
> > 
> > > And once we realize that it's about latencies (assuming I'm right!), not about 
> > > correctness per se, I'm wondering whether it would be a good idea to introduce 
> > > down_write_interruptible(), instead of down_write_killable().
> > 
> > I am not against interruptible variant as well but I suspect that some paths are 
> > not expected to return EINTR. I haven't checked them for this but killable is 
> > sufficient for the problem I am trying to solve. That problem is real while 
> > latencies do not seem to be that eminent.
> 
> If they don't expect EINTR then they sure don't expect SIGKILL either!

Why? Each syscall already is killable as the task might be killed by the
OOM killer.

> There's a (very) low number of system calls that are not interruptible, but the 
> vast majority is.

That might be true. I just fail to see how this is related to the
particular problem I am trying to solve. As I've said those callsites
which cause problems with latencies can be later converted to
interruptible waiting trivially.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
@ 2016-03-09 13:28         ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-03-09 13:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Peter Zijlstra

On Wed 09-03-16 14:17:10, Ingo Molnar wrote:
> 
> * Michal Hocko <mhocko@kernel.org> wrote:
> 
> > >  [...] this is a follow up work for oom_reaper [1]. As the async OOM killing 
> > >  depends on oom_sem for read we would really appreciate if a holder for write 
> > >  stood in the way. This patchset is changing many of down_write calls to be 
> > >  killable to help those cases when the writer is blocked and waiting for 
> > >  readers to release the lock and so help __oom_reap_task to process the oom 
> > >  victim.
> > > 
> > > there seems to be a misunderstanding: if a writer is blocked waiting for 
> > > readers then no new readers are allowed - the writer will get its turn the 
> > > moment all existing readers drop the lock.
> > 
> > Readers might be blocked e.g. on the memory allocation which cannot proceed due 
> > to OOM. Such a reader might be operating on a remote mm.
> 
> Doing complex allocations with the mm locked looks fragile no matter what: we 
> should add debugging code that warns if allocations are done with a remote mm 
> locked. (it should be trivial)

No matter how fragile is that it is not something non-existent. Just
have a look at use_mm for example. We definitely do not want to warn
about those, right?

> In fact people were thining about turning the mm semaphore into a rwlock - with 
> that no blocking call should be possible with the lock held.
> 
> So I maintain:
> 
> > > So there's no livelock scenario - it's "only" about latencies.
> 
> With a qualification: s/only/mostly ;-)
> 
> > Latency is certainly one aspect of it as well because the sooner the mmap_sem 
> > gets released for other readers to sooner the oom_reaper can tear down the 
> > victims address space and release the memory and free up some memory so that we 
> > do not have to wait for the victim to exit.
> > 
> > > And once we realize that it's about latencies (assuming I'm right!), not about 
> > > correctness per se, I'm wondering whether it would be a good idea to introduce 
> > > down_write_interruptible(), instead of down_write_killable().
> > 
> > I am not against interruptible variant as well but I suspect that some paths are 
> > not expected to return EINTR. I haven't checked them for this but killable is 
> > sufficient for the problem I am trying to solve. That problem is real while 
> > latencies do not seem to be that eminent.
> 
> If they don't expect EINTR then they sure don't expect SIGKILL either!

Why? Each syscall already is killable as the task might be killed by the
OOM killer.

> There's a (very) low number of system calls that are not interruptible, but the 
> vast majority is.

That might be true. I just fail to see how this is related to the
particular problem I am trying to solve. As I've said those callsites
which cause problems with latencies can be later converted to
interruptible waiting trivially.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
  2016-03-09 13:28         ` Michal Hocko
@ 2016-03-09 13:43           ` Ingo Molnar
  -1 siblings, 0 replies; 64+ messages in thread
From: Ingo Molnar @ 2016-03-09 13:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Peter Zijlstra


* Michal Hocko <mhocko@kernel.org> wrote:

> On Wed 09-03-16 14:17:10, Ingo Molnar wrote:
> > 
> > * Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > > >  [...] this is a follow up work for oom_reaper [1]. As the async OOM killing 
> > > >  depends on oom_sem for read we would really appreciate if a holder for write 
> > > >  stood in the way. This patchset is changing many of down_write calls to be 
> > > >  killable to help those cases when the writer is blocked and waiting for 
> > > >  readers to release the lock and so help __oom_reap_task to process the oom 
> > > >  victim.
> > > > 
> > > > there seems to be a misunderstanding: if a writer is blocked waiting for 
> > > > readers then no new readers are allowed - the writer will get its turn the 
> > > > moment all existing readers drop the lock.
> > > 
> > > Readers might be blocked e.g. on the memory allocation which cannot proceed due 
> > > to OOM. Such a reader might be operating on a remote mm.
> > 
> > Doing complex allocations with the mm locked looks fragile no matter what: we 
> > should add debugging code that warns if allocations are done with a remote mm 
> > locked. (it should be trivial)
> 
> No matter how fragile is that it is not something non-existent. Just
> have a look at use_mm for example. We definitely do not want to warn
> about those, right?

Sure we care about eliminating fragility, and usage does not seem to be widespread 
at all:

 triton:~/tip> git grep -w use_mm

 drivers/staging/rdma/hfi1/user_sdma.c:          use_mm(req->pq->user_mm);
 drivers/usb/gadget/function/f_fs.c:             use_mm(io_data->mm);
 drivers/usb/gadget/legacy/inode.c:      use_mm(mm);
 drivers/vhost/vhost.c:  use_mm(dev->mm);

I think we also want to keep our general flexibility wrt. eventually turning the 
mmap_sem into a spinlock ...

> > > I am not against interruptible variant as well but I suspect that some paths 
> > > are not expected to return EINTR. I haven't checked them for this but 
> > > killable is sufficient for the problem I am trying to solve. That problem is 
> > > real while latencies do not seem to be that eminent.
> > 
> > If they don't expect EINTR then they sure don't expect SIGKILL either!
> 
> Why? Each syscall already is killable as the task might be killed by the OOM 
> killer.

Not all syscalls are interruptible - for example sys_sync() isn't:

SYSCALL_DEFINE0(sync)
{
        int nowait = 0, wait = 1;

        wakeup_flusher_threads(0, WB_REASON_SYNC);
        iterate_supers(sync_inodes_one_sb, NULL);
        iterate_supers(sync_fs_one_sb, &nowait);
        iterate_supers(sync_fs_one_sb, &wait);
        iterate_bdevs(fdatawrite_one_bdev, NULL);
        iterate_bdevs(fdatawait_one_bdev, NULL);
        if (unlikely(laptop_mode))
                laptop_sync_completion();
        return 0;
}

> > There's a (very) low number of system calls that are not interruptible, but 
> > the vast majority is.
> 
> That might be true. I just fail to see how this is related to the
> particular problem I am trying to solve. As I've said those callsites
> which cause problems with latencies can be later converted to
> interruptible waiting trivially.

So my problem as I see it is the following: you are adding a rare API to an 
already complex locking interface, further complicating already complicated MM 
code paths in various ways. Only to help a case that is a third type of rare: 
OOM-kill.

That's a surefire whack-a-mole nest of bugs, if I've ever seen one.

What I am suggesting instead is a slight modification of the concept: to re-phrase 
the problem set and think in broader terms of interruptability: make certain MM 
operations, especially ones which tend to hinder OOM-kill latencies, more 
interruptible - which implicitly also makes them more OOM-killable.

That's a win-win as I see it: as both your usecase and a lot of other usecases 
will be improved - and it will also be tested a lot more than any OOM-kill path 
will be tested.

I might be wrong in the end, but your counterarguments were not convincing so far 
(to me).

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
@ 2016-03-09 13:43           ` Ingo Molnar
  0 siblings, 0 replies; 64+ messages in thread
From: Ingo Molnar @ 2016-03-09 13:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Peter Zijlstra


* Michal Hocko <mhocko@kernel.org> wrote:

> On Wed 09-03-16 14:17:10, Ingo Molnar wrote:
> > 
> > * Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > > >  [...] this is a follow up work for oom_reaper [1]. As the async OOM killing 
> > > >  depends on oom_sem for read we would really appreciate if a holder for write 
> > > >  stood in the way. This patchset is changing many of down_write calls to be 
> > > >  killable to help those cases when the writer is blocked and waiting for 
> > > >  readers to release the lock and so help __oom_reap_task to process the oom 
> > > >  victim.
> > > > 
> > > > there seems to be a misunderstanding: if a writer is blocked waiting for 
> > > > readers then no new readers are allowed - the writer will get its turn the 
> > > > moment all existing readers drop the lock.
> > > 
> > > Readers might be blocked e.g. on the memory allocation which cannot proceed due 
> > > to OOM. Such a reader might be operating on a remote mm.
> > 
> > Doing complex allocations with the mm locked looks fragile no matter what: we 
> > should add debugging code that warns if allocations are done with a remote mm 
> > locked. (it should be trivial)
> 
> No matter how fragile is that it is not something non-existent. Just
> have a look at use_mm for example. We definitely do not want to warn
> about those, right?

Sure we care about eliminating fragility, and usage does not seem to be widespread 
at all:

 triton:~/tip> git grep -w use_mm

 drivers/staging/rdma/hfi1/user_sdma.c:          use_mm(req->pq->user_mm);
 drivers/usb/gadget/function/f_fs.c:             use_mm(io_data->mm);
 drivers/usb/gadget/legacy/inode.c:      use_mm(mm);
 drivers/vhost/vhost.c:  use_mm(dev->mm);

I think we also want to keep our general flexibility wrt. eventually turning the 
mmap_sem into a spinlock ...

> > > I am not against interruptible variant as well but I suspect that some paths 
> > > are not expected to return EINTR. I haven't checked them for this but 
> > > killable is sufficient for the problem I am trying to solve. That problem is 
> > > real while latencies do not seem to be that eminent.
> > 
> > If they don't expect EINTR then they sure don't expect SIGKILL either!
> 
> Why? Each syscall already is killable as the task might be killed by the OOM 
> killer.

Not all syscalls are interruptible - for example sys_sync() isn't:

SYSCALL_DEFINE0(sync)
{
        int nowait = 0, wait = 1;

        wakeup_flusher_threads(0, WB_REASON_SYNC);
        iterate_supers(sync_inodes_one_sb, NULL);
        iterate_supers(sync_fs_one_sb, &nowait);
        iterate_supers(sync_fs_one_sb, &wait);
        iterate_bdevs(fdatawrite_one_bdev, NULL);
        iterate_bdevs(fdatawait_one_bdev, NULL);
        if (unlikely(laptop_mode))
                laptop_sync_completion();
        return 0;
}

> > There's a (very) low number of system calls that are not interruptible, but 
> > the vast majority is.
> 
> That might be true. I just fail to see how this is related to the
> particular problem I am trying to solve. As I've said those callsites
> which cause problems with latencies can be later converted to
> interruptible waiting trivially.

So my problem as I see it is the following: you are adding a rare API to an 
already complex locking interface, further complicating already complicated MM 
code paths in various ways. Only to help a case that is a third type of rare: 
OOM-kill.

That's a surefire whack-a-mole nest of bugs, if I've ever seen one.

What I am suggesting instead is a slight modification of the concept: to re-phrase 
the problem set and think in broader terms of interruptability: make certain MM 
operations, especially ones which tend to hinder OOM-kill latencies, more 
interruptible - which implicitly also makes them more OOM-killable.

That's a win-win as I see it: as both your usecase and a lot of other usecases 
will be improved - and it will also be tested a lot more than any OOM-kill path 
will be tested.

I might be wrong in the end, but your counterarguments were not convincing so far 
(to me).

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
  2016-03-09 13:43           ` Ingo Molnar
@ 2016-03-09 14:41             ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-03-09 14:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Peter Zijlstra

On Wed 09-03-16 14:43:39, Ingo Molnar wrote:
> 
> * Michal Hocko <mhocko@kernel.org> wrote:
> 
> > On Wed 09-03-16 14:17:10, Ingo Molnar wrote:
> > > 
> > > * Michal Hocko <mhocko@kernel.org> wrote:
> > > 
> > > > >  [...] this is a follow up work for oom_reaper [1]. As the async OOM killing 
> > > > >  depends on oom_sem for read we would really appreciate if a holder for write 
> > > > >  stood in the way. This patchset is changing many of down_write calls to be 
> > > > >  killable to help those cases when the writer is blocked and waiting for 
> > > > >  readers to release the lock and so help __oom_reap_task to process the oom 
> > > > >  victim.
> > > > > 
> > > > > there seems to be a misunderstanding: if a writer is blocked waiting for 
> > > > > readers then no new readers are allowed - the writer will get its turn the 
> > > > > moment all existing readers drop the lock.
> > > > 
> > > > Readers might be blocked e.g. on the memory allocation which cannot proceed due 
> > > > to OOM. Such a reader might be operating on a remote mm.
> > > 
> > > Doing complex allocations with the mm locked looks fragile no matter what: we 
> > > should add debugging code that warns if allocations are done with a remote mm 
> > > locked. (it should be trivial)
> > 
> > No matter how fragile is that it is not something non-existent. Just
> > have a look at use_mm for example. We definitely do not want to warn
> > about those, right?
> 
> Sure we care about eliminating fragility, and usage does not seem to be widespread 
> at all:

This was just an example. We have others. khugepaged which tries to
create THP in the background or proc per pid files handlers might hold
the lock while allocating and who knows what else.
 
>  triton:~/tip> git grep -w use_mm
> 
>  drivers/staging/rdma/hfi1/user_sdma.c:          use_mm(req->pq->user_mm);
>  drivers/usb/gadget/function/f_fs.c:             use_mm(io_data->mm);
>  drivers/usb/gadget/legacy/inode.c:      use_mm(mm);
>  drivers/vhost/vhost.c:  use_mm(dev->mm);
> 
> I think we also want to keep our general flexibility wrt. eventually turning the 
> mmap_sem into a spinlock ...

Many places which are holding mmap_sem are sleepable right (e.g. doing
GFP_KERNEL allocation) now and I am not really sure we can change all of
them to not be.
 
> > > > I am not against interruptible variant as well but I suspect that some paths 
> > > > are not expected to return EINTR. I haven't checked them for this but 
> > > > killable is sufficient for the problem I am trying to solve. That problem is 
> > > > real while latencies do not seem to be that eminent.
> > > 
> > > If they don't expect EINTR then they sure don't expect SIGKILL either!
> > 
> > Why? Each syscall already is killable as the task might be killed by the OOM 
> > killer.
> 
> Not all syscalls are interruptible - for example sys_sync() isn't:

I guess we are talking past each other. What I meant was that while
all syscalls are allowed to not return to the userspace because the
task might get killed but not all of them accept to get interrupted by
a signal and return with EINTR. None of the man page of mmap, mremap,
mlock, mprotect list EINTR as a possibility so I would be really afraid
of returning an unexpected error code.

> SYSCALL_DEFINE0(sync)
> {
>         int nowait = 0, wait = 1;
> 
>         wakeup_flusher_threads(0, WB_REASON_SYNC);
>         iterate_supers(sync_inodes_one_sb, NULL);
>         iterate_supers(sync_fs_one_sb, &nowait);
>         iterate_supers(sync_fs_one_sb, &wait);
>         iterate_bdevs(fdatawrite_one_bdev, NULL);
>         iterate_bdevs(fdatawait_one_bdev, NULL);
>         if (unlikely(laptop_mode))
>                 laptop_sync_completion();
>         return 0;
> }
> 
> > > There's a (very) low number of system calls that are not interruptible, but 
> > > the vast majority is.
> > 
> > That might be true. I just fail to see how this is related to the
> > particular problem I am trying to solve. As I've said those callsites
> > which cause problems with latencies can be later converted to
> > interruptible waiting trivially.
> 
> So my problem as I see it is the following: you are adding a rare API to an 
> already complex locking interface,

I have mimicked an API which we already have for mutex. Sure it is not
used much yet but I guess other callsites might benefit from using it
as well. I haven't explored that because I am trying to focus on the
OOM problem currently. I have tried hard to not complicate the core
locking code just because of the new API. If you have any concerns there
I am willing to look at it.

> further complicating already complicated MM code paths in various
> ways. Only to help a case that is a third type of rare:  OOM-kill.

Seeing OOM livelocks resp. deadlocks is not something unseen. We
have seen some reports in the past. The matter of fact is that it
is easy for an unprivileged user to lock up the system completely
currently (OOM killer tries to kill a task which is blocked on a
lock - e.g. i_mutex - which is held by another process which loops
inside the memory allocator). This issue has been discussed a lot
recently (e.g. LWN coverage of the discussion at LSFMM last year
https://lwn.net/Articles/627419/). That is the reason why I am trying to
make the OOM handling more reliable (by introducing a reliable async oom
reclaim aka oom_reaper_) and that sounds like a sufficient justification
to me.

Does this make more sense now?

> 
> That's a surefire whack-a-mole nest of bugs, if I've ever seen one.
> 
> What I am suggesting instead is a slight modification of the
> concept: to re-phrase the problem set and think in broader terms
> of interruptability: make certain MM operations, especially ones
> which tend to hinder OOM-kill latencies, more interruptible - which
> implicitly also makes them more OOM-killable.

Yeah, I have understood your suggestion and I see it as a more generic
approach as well but my counter argument was that some syscalls might be
unexpected to return EINTR. My quick check of memory management syscalls
shown that EINTR is not described as a potential error code. Or am I
missing something here and you are suggesting ERESTARTNOINTR return path
from down_write_interruptible?

Thanks
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
@ 2016-03-09 14:41             ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-03-09 14:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Peter Zijlstra

On Wed 09-03-16 14:43:39, Ingo Molnar wrote:
> 
> * Michal Hocko <mhocko@kernel.org> wrote:
> 
> > On Wed 09-03-16 14:17:10, Ingo Molnar wrote:
> > > 
> > > * Michal Hocko <mhocko@kernel.org> wrote:
> > > 
> > > > >  [...] this is a follow up work for oom_reaper [1]. As the async OOM killing 
> > > > >  depends on oom_sem for read we would really appreciate if a holder for write 
> > > > >  stood in the way. This patchset is changing many of down_write calls to be 
> > > > >  killable to help those cases when the writer is blocked and waiting for 
> > > > >  readers to release the lock and so help __oom_reap_task to process the oom 
> > > > >  victim.
> > > > > 
> > > > > there seems to be a misunderstanding: if a writer is blocked waiting for 
> > > > > readers then no new readers are allowed - the writer will get its turn the 
> > > > > moment all existing readers drop the lock.
> > > > 
> > > > Readers might be blocked e.g. on the memory allocation which cannot proceed due 
> > > > to OOM. Such a reader might be operating on a remote mm.
> > > 
> > > Doing complex allocations with the mm locked looks fragile no matter what: we 
> > > should add debugging code that warns if allocations are done with a remote mm 
> > > locked. (it should be trivial)
> > 
> > No matter how fragile is that it is not something non-existent. Just
> > have a look at use_mm for example. We definitely do not want to warn
> > about those, right?
> 
> Sure we care about eliminating fragility, and usage does not seem to be widespread 
> at all:

This was just an example. We have others. khugepaged which tries to
create THP in the background or proc per pid files handlers might hold
the lock while allocating and who knows what else.
 
>  triton:~/tip> git grep -w use_mm
> 
>  drivers/staging/rdma/hfi1/user_sdma.c:          use_mm(req->pq->user_mm);
>  drivers/usb/gadget/function/f_fs.c:             use_mm(io_data->mm);
>  drivers/usb/gadget/legacy/inode.c:      use_mm(mm);
>  drivers/vhost/vhost.c:  use_mm(dev->mm);
> 
> I think we also want to keep our general flexibility wrt. eventually turning the 
> mmap_sem into a spinlock ...

Many places which are holding mmap_sem are sleepable right (e.g. doing
GFP_KERNEL allocation) now and I am not really sure we can change all of
them to not be.
 
> > > > I am not against interruptible variant as well but I suspect that some paths 
> > > > are not expected to return EINTR. I haven't checked them for this but 
> > > > killable is sufficient for the problem I am trying to solve. That problem is 
> > > > real while latencies do not seem to be that eminent.
> > > 
> > > If they don't expect EINTR then they sure don't expect SIGKILL either!
> > 
> > Why? Each syscall already is killable as the task might be killed by the OOM 
> > killer.
> 
> Not all syscalls are interruptible - for example sys_sync() isn't:

I guess we are talking past each other. What I meant was that while
all syscalls are allowed to not return to the userspace because the
task might get killed but not all of them accept to get interrupted by
a signal and return with EINTR. None of the man page of mmap, mremap,
mlock, mprotect list EINTR as a possibility so I would be really afraid
of returning an unexpected error code.

> SYSCALL_DEFINE0(sync)
> {
>         int nowait = 0, wait = 1;
> 
>         wakeup_flusher_threads(0, WB_REASON_SYNC);
>         iterate_supers(sync_inodes_one_sb, NULL);
>         iterate_supers(sync_fs_one_sb, &nowait);
>         iterate_supers(sync_fs_one_sb, &wait);
>         iterate_bdevs(fdatawrite_one_bdev, NULL);
>         iterate_bdevs(fdatawait_one_bdev, NULL);
>         if (unlikely(laptop_mode))
>                 laptop_sync_completion();
>         return 0;
> }
> 
> > > There's a (very) low number of system calls that are not interruptible, but 
> > > the vast majority is.
> > 
> > That might be true. I just fail to see how this is related to the
> > particular problem I am trying to solve. As I've said those callsites
> > which cause problems with latencies can be later converted to
> > interruptible waiting trivially.
> 
> So my problem as I see it is the following: you are adding a rare API to an 
> already complex locking interface,

I have mimicked an API which we already have for mutex. Sure it is not
used much yet but I guess other callsites might benefit from using it
as well. I haven't explored that because I am trying to focus on the
OOM problem currently. I have tried hard to not complicate the core
locking code just because of the new API. If you have any concerns there
I am willing to look at it.

> further complicating already complicated MM code paths in various
> ways. Only to help a case that is a third type of rare:  OOM-kill.

Seeing OOM livelocks resp. deadlocks is not something unseen. We
have seen some reports in the past. The matter of fact is that it
is easy for an unprivileged user to lock up the system completely
currently (OOM killer tries to kill a task which is blocked on a
lock - e.g. i_mutex - which is held by another process which loops
inside the memory allocator). This issue has been discussed a lot
recently (e.g. LWN coverage of the discussion at LSFMM last year
https://lwn.net/Articles/627419/). That is the reason why I am trying to
make the OOM handling more reliable (by introducing a reliable async oom
reclaim aka oom_reaper_) and that sounds like a sufficient justification
to me.

Does this make more sense now?

> 
> That's a surefire whack-a-mole nest of bugs, if I've ever seen one.
> 
> What I am suggesting instead is a slight modification of the
> concept: to re-phrase the problem set and think in broader terms
> of interruptability: make certain MM operations, especially ones
> which tend to hinder OOM-kill latencies, more interruptible - which
> implicitly also makes them more OOM-killable.

Yeah, I have understood your suggestion and I see it as a more generic
approach as well but my counter argument was that some syscalls might be
unexpected to return EINTR. My quick check of memory management syscalls
shown that EINTR is not described as a potential error code. Or am I
missing something here and you are suggesting ERESTARTNOINTR return path
from down_write_interruptible?

Thanks
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
  2016-03-09 14:41             ` Michal Hocko
@ 2016-03-10 10:24               ` Ingo Molnar
  -1 siblings, 0 replies; 64+ messages in thread
From: Ingo Molnar @ 2016-03-10 10:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Peter Zijlstra


* Michal Hocko <mhocko@kernel.org> wrote:

> > > Why? Each syscall already is killable as the task might be killed by the OOM 
> > > killer.
> > 
> > Not all syscalls are interruptible - for example sys_sync() isn't:
> 
> I guess we are talking past each other. [...]

Heh, you are being polite, I think what happened is that I was being dense and 
didn't understand your point:

> [...] What I meant was that while all syscalls are allowed to not return to the 
> userspace because the task might get killed but not all of them accept to get 
> interrupted by a signal and return with EINTR. None of the man page of mmap, 
> mremap, mlock, mprotect list EINTR as a possibility so I would be really afraid 
> of returning an unexpected error code.

Indeed.

> Does this make more sense now?

It does!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 0/12] introduce down_write_killable for rw_semaphore
@ 2016-03-10 10:24               ` Ingo Molnar
  0 siblings, 0 replies; 64+ messages in thread
From: Ingo Molnar @ 2016-03-10 10:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Linus Torvalds, Peter Zijlstra


* Michal Hocko <mhocko@kernel.org> wrote:

> > > Why? Each syscall already is killable as the task might be killed by the OOM 
> > > killer.
> > 
> > Not all syscalls are interruptible - for example sys_sync() isn't:
> 
> I guess we are talking past each other. [...]

Heh, you are being polite, I think what happened is that I was being dense and 
didn't understand your point:

> [...] What I meant was that while all syscalls are allowed to not return to the 
> userspace because the task might get killed but not all of them accept to get 
> interrupted by a signal and return with EINTR. None of the man page of mmap, 
> mremap, mlock, mprotect list EINTR as a possibility so I would be really afraid 
> of returning an unexpected error code.

Indeed.

> Does this make more sense now?

It does!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 10/12] x86, rwsem: simplify __down_write
  2016-02-03  8:10     ` Ingo Molnar
@ 2016-06-03 16:13       ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2016-06-03 16:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michal Hocko, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko,
	Linus Torvalds, Paul E. McKenney

On Wed, Feb 03, 2016 at 09:10:16AM +0100, Ingo Molnar wrote:
> 
> * Michal Hocko <mhocko@kernel.org> wrote:
> 
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > x86 implementation of __down_write is using inline asm to optimize the
> > code flow. This however requires that it has go over an additional hop
> > for the slow path call_rwsem_down_write_failed which has to
> > save_common_regs/restore_common_regs to preserve the calling convention.
> > This, however doesn't add much because the fast path only saves one
> > register push/pop (rdx) when compared to the generic implementation:
> > 
> > Before:
> > 0000000000000019 <down_write>:
> >   19:   e8 00 00 00 00          callq  1e <down_write+0x5>
> >   1e:   55                      push   %rbp
> >   1f:   48 ba 01 00 00 00 ff    movabs $0xffffffff00000001,%rdx
> >   26:   ff ff ff
> >   29:   48 89 f8                mov    %rdi,%rax
> >   2c:   48 89 e5                mov    %rsp,%rbp
> >   2f:   f0 48 0f c1 10          lock xadd %rdx,(%rax)
> >   34:   85 d2                   test   %edx,%edx
> >   36:   74 05                   je     3d <down_write+0x24>
> >   38:   e8 00 00 00 00          callq  3d <down_write+0x24>
> >   3d:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
> >   44:   00 00
> >   46:   5d                      pop    %rbp
> >   47:   48 89 47 38             mov    %rax,0x38(%rdi)
> >   4b:   c3                      retq
> > 
> > After:
> > 0000000000000019 <down_write>:
> >   19:   e8 00 00 00 00          callq  1e <down_write+0x5>
> >   1e:   55                      push   %rbp
> >   1f:   48 b8 01 00 00 00 ff    movabs $0xffffffff00000001,%rax
> >   26:   ff ff ff
> >   29:   48 89 e5                mov    %rsp,%rbp
> >   2c:   53                      push   %rbx
> >   2d:   48 89 fb                mov    %rdi,%rbx
> >   30:   f0 48 0f c1 07          lock xadd %rax,(%rdi)
> >   35:   48 85 c0                test   %rax,%rax
> >   38:   74 05                   je     3f <down_write+0x26>
> >   3a:   e8 00 00 00 00          callq  3f <down_write+0x26>
> >   3f:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
> >   46:   00 00
> >   48:   48 89 43 38             mov    %rax,0x38(%rbx)
> >   4c:   5b                      pop    %rbx
> >   4d:   5d                      pop    %rbp
> >   4e:   c3                      retq
> 
> I'm not convinced about the removal of this optimization at all.

So I've been playing with this again because Jason's atomic_long_t
patches made a mess of things.

(similar findings for both ia64 and s390, suggesting killing all
arch/*/include/asm/rwsem.h might actuyally be an option).

Here's what I get (gcc-5.3.0):

# size x86_64-defconfig/vmlinux.{orig,mod}
   text    data     bss     dec     hex filename
10193229        4340512 1105920 15639661         eea46d x86_64-defconfig/vmlinux.orig
10193101        4340512 1105920 15639533         eea3ed x86_64-defconfig/vmlinux.mod

# objdump -D x86_64-defconfig/vmlinux.orig | awk '/<[^>]*>:$/ { p=0; } /<down_write>:/ { p=1; } { if (p) print $0; }'
ffffffff818d0d80 <down_write>:
ffffffff818d0d80:       55                      push   %rbp
ffffffff818d0d81:       48 89 e5                mov    %rsp,%rbp
ffffffff818d0d84:       53                      push   %rbx
ffffffff818d0d85:       48 89 fb                mov    %rdi,%rbx
ffffffff818d0d88:       e8 33 e3 ff ff          callq  ffffffff818cf0c0 <_cond_resched>
ffffffff818d0d8d:       48 ba 01 00 00 00 ff    movabs $0xffffffff00000001,%rdx
ffffffff818d0d94:       ff ff ff 
ffffffff818d0d97:       48 89 d8                mov    %rbx,%rax
ffffffff818d0d9a:       f0 48 0f c1 10          lock xadd %rdx,(%rax)
ffffffff818d0d9f:       85 d2                   test   %edx,%edx
ffffffff818d0da1:       74 05                   je     ffffffff818d0da8 <down_write+0x28>
ffffffff818d0da3:       e8 b8 d0 a5 ff          callq  ffffffff8132de60 <call_rwsem_down_write_failed>
ffffffff818d0da8:       65 48 8b 04 25 00 c4    mov    %gs:0xc400,%rax
ffffffff818d0daf:       00 00 
ffffffff818d0db1:       48 89 43 20             mov    %rax,0x20(%rbx)
ffffffff818d0db5:       5b                      pop    %rbx
ffffffff818d0db6:       5d                      pop    %rbp
ffffffff818d0db7:       c3                      retq   
ffffffff818d0db8:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
ffffffff818d0dbf:       00 

^C
# objdump -D x86_64-defconfig/vmlinux.mod | awk '/<[^>]*>:$/ { p=0; } /<down_write>:/ { p=1; } { if (p) print $0; }'
ffffffff818d0cf0 <down_write>:
ffffffff818d0cf0:       55                      push   %rbp
ffffffff818d0cf1:       48 89 e5                mov    %rsp,%rbp
ffffffff818d0cf4:       53                      push   %rbx
ffffffff818d0cf5:       48 89 fb                mov    %rdi,%rbx
ffffffff818d0cf8:       e8 23 e3 ff ff          callq  ffffffff818cf020 <_cond_resched>
ffffffff818d0cfd:       48 b8 01 00 00 00 ff    movabs $0xffffffff00000001,%rax
ffffffff818d0d04:       ff ff ff 
ffffffff818d0d07:       f0 48 0f c1 03          lock xadd %rax,(%rbx)
ffffffff818d0d0c:       48 85 c0                test   %rax,%rax
ffffffff818d0d0f:       75 10                   jne    ffffffff818d0d21 <down_write+0x31>
ffffffff818d0d11:       65 48 8b 04 25 00 c4    mov    %gs:0xc400,%rax
ffffffff818d0d18:       00 00 
ffffffff818d0d1a:       48 89 43 20             mov    %rax,0x20(%rbx)
ffffffff818d0d1e:       5b                      pop    %rbx
ffffffff818d0d1f:       5d                      pop    %rbp
ffffffff818d0d20:       c3                      retq   
ffffffff818d0d21:       48 89 df                mov    %rbx,%rdi
ffffffff818d0d24:       e8 f7 06 00 00          callq  ffffffff818d1420 <rwsem_down_write_failed>
ffffffff818d0d29:       eb e6                   jmp    ffffffff818d0d11 <down_write+0x21>
ffffffff818d0d2b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

---

diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index aeac434..1262556 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -10,3 +10,4 @@ generic-y += dma-contiguous.h
 generic-y += early_ioremap.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += rwsem.h
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
deleted file mode 100644
index 089ced4..0000000
--- a/arch/x86/include/asm/rwsem.h
+++ /dev/null
@@ -1,217 +0,0 @@
-/* rwsem.h: R/W semaphores implemented using XADD/CMPXCHG for i486+
- *
- * Written by David Howells (dhowells@redhat.com).
- *
- * Derived from asm-x86/semaphore.h
- *
- *
- * The MSW of the count is the negated number of active writers and waiting
- * lockers, and the LSW is the total number of active locks
- *
- * The lock count is initialized to 0 (no active and no waiting lockers).
- *
- * When a writer subtracts WRITE_BIAS, it'll get 0xffff0001 for the case of an
- * uncontended lock. This can be determined because XADD returns the old value.
- * Readers increment by 1 and see a positive value when uncontended, negative
- * if there are writers (and maybe) readers waiting (in which case it goes to
- * sleep).
- *
- * The value of WAITING_BIAS supports up to 32766 waiting processes. This can
- * be extended to 65534 by manually checking the whole MSW rather than relying
- * on the S flag.
- *
- * The value of ACTIVE_BIAS supports up to 65535 active processes.
- *
- * This should be totally fair - if anything is waiting, a process that wants a
- * lock will go to the back of the queue. When the currently active lock is
- * released, if there's a writer at the front of the queue, then that and only
- * that will be woken up; if there's a bunch of consecutive readers at the
- * front, then they'll all be woken up, but no other readers will be.
- */
-
-#ifndef _ASM_X86_RWSEM_H
-#define _ASM_X86_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-#include <asm/asm.h>
-
-/*
- * The bias values and the counter type limits the number of
- * potential readers/writers to 32767 for 32 bits and 2147483647
- * for 64 bits.
- */
-
-#ifdef CONFIG_X86_64
-# define RWSEM_ACTIVE_MASK		0xffffffffL
-#else
-# define RWSEM_ACTIVE_MASK		0x0000ffffL
-#endif
-
-#define RWSEM_UNLOCKED_VALUE		0x00000000L
-#define RWSEM_ACTIVE_BIAS		0x00000001L
-#define RWSEM_WAITING_BIAS		(-RWSEM_ACTIVE_MASK-1)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	asm volatile("# beginning down_read\n\t"
-		     LOCK_PREFIX _ASM_INC "(%1)\n\t"
-		     /* adds 0x00000001 */
-		     "  jns        1f\n"
-		     "  call call_rwsem_down_read_failed\n"
-		     "1:\n\t"
-		     "# ending down_read\n\t"
-		     : "+m" (sem->count)
-		     : "a" (sem)
-		     : "memory", "cc");
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
-	long result, tmp;
-	asm volatile("# beginning __down_read_trylock\n\t"
-		     "  mov          %0,%1\n\t"
-		     "1:\n\t"
-		     "  mov          %1,%2\n\t"
-		     "  add          %3,%2\n\t"
-		     "  jle	     2f\n\t"
-		     LOCK_PREFIX "  cmpxchg  %2,%0\n\t"
-		     "  jnz	     1b\n\t"
-		     "2:\n\t"
-		     "# ending __down_read_trylock\n\t"
-		     : "+m" (sem->count), "=&a" (result), "=&r" (tmp)
-		     : "i" (RWSEM_ACTIVE_READ_BIAS)
-		     : "memory", "cc");
-	return result >= 0 ? 1 : 0;
-}
-
-/*
- * lock for writing
- */
-#define ____down_write(sem, slow_path)			\
-({							\
-	long tmp;					\
-	struct rw_semaphore* ret;			\
-	asm volatile("# beginning down_write\n\t"	\
-		     LOCK_PREFIX "  xadd      %1,(%3)\n\t"	\
-		     /* adds 0xffff0001, returns the old value */ \
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
-		     /* was the active mask 0 before? */\
-		     "  jz        1f\n"			\
-		     "  call " slow_path "\n"		\
-		     "1:\n"				\
-		     "# ending down_write"		\
-		     : "+m" (sem->count), "=d" (tmp), "=a" (ret)	\
-		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
-		     : "memory", "cc");			\
-	ret;						\
-})
-
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	____down_write(sem, "call_rwsem_down_write_failed");
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	if (IS_ERR(____down_write(sem, "call_rwsem_down_write_failed_killable")))
-		return -EINTR;
-
-	return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
-	long result, tmp;
-	asm volatile("# beginning __down_write_trylock\n\t"
-		     "  mov          %0,%1\n\t"
-		     "1:\n\t"
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t"
-		     /* was the active mask 0 before? */
-		     "  jnz          2f\n\t"
-		     "  mov          %1,%2\n\t"
-		     "  add          %3,%2\n\t"
-		     LOCK_PREFIX "  cmpxchg  %2,%0\n\t"
-		     "  jnz	     1b\n\t"
-		     "2:\n\t"
-		     "  sete         %b1\n\t"
-		     "  movzbl       %b1, %k1\n\t"
-		     "# ending __down_write_trylock\n\t"
-		     : "+m" (sem->count), "=&a" (result), "=&r" (tmp)
-		     : "er" (RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
-	return result;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	long tmp;
-	asm volatile("# beginning __up_read\n\t"
-		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
-		     /* subtracts 1, returns the old value */
-		     "  jns        1f\n\t"
-		     "  call call_rwsem_wake\n" /* expects old value in %edx */
-		     "1:\n"
-		     "# ending __up_read\n"
-		     : "+m" (sem->count), "=d" (tmp)
-		     : "a" (sem), "1" (-RWSEM_ACTIVE_READ_BIAS)
-		     : "memory", "cc");
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	long tmp;
-	asm volatile("# beginning __up_write\n\t"
-		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
-		     /* subtracts 0xffff0001, returns the old value */
-		     "  jns        1f\n\t"
-		     "  call call_rwsem_wake\n" /* expects old value in %edx */
-		     "1:\n\t"
-		     "# ending __up_write\n"
-		     : "+m" (sem->count), "=d" (tmp)
-		     : "a" (sem), "1" (-RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	asm volatile("# beginning __downgrade_write\n\t"
-		     LOCK_PREFIX _ASM_ADD "%2,(%1)\n\t"
-		     /*
-		      * transitions 0xZZZZ0001 -> 0xYYYY0001 (i386)
-		      *     0xZZZZZZZZ00000001 -> 0xYYYYYYYY00000001 (x86_64)
-		      */
-		     "  jns       1f\n\t"
-		     "  call call_rwsem_downgrade_wake\n"
-		     "1:\n\t"
-		     "# ending __downgrade_write\n"
-		     : "+m" (sem->count)
-		     : "a" (sem), "er" (-RWSEM_WAITING_BIAS)
-		     : "memory", "cc");
-}
-
-#endif /* __KERNEL__ */
-#endif /* _ASM_X86_RWSEM_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 72a5767..c47dd5f 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -22,7 +22,6 @@ obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o
 lib-y := delay.o misc.o cmdline.o cpu.o
 lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
-lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
deleted file mode 100644
index bf2c607..0000000
--- a/arch/x86/lib/rwsem.S
+++ /dev/null
@@ -1,144 +0,0 @@
-/*
- * x86 semaphore implementation.
- *
- * (C) Copyright 1999 Linus Torvalds
- *
- * Portions Copyright 1999 Red Hat, Inc.
- *
- *	This program is free software; you can redistribute it and/or
- *	modify it under the terms of the GNU General Public License
- *	as published by the Free Software Foundation; either version
- *	2 of the License, or (at your option) any later version.
- *
- * rw semaphores implemented November 1999 by Benjamin LaHaise <bcrl@kvack.org>
- */
-
-#include <linux/linkage.h>
-#include <asm/alternative-asm.h>
-#include <asm/frame.h>
-
-#define __ASM_HALF_REG(reg)	__ASM_SEL(reg, e##reg)
-#define __ASM_HALF_SIZE(inst)	__ASM_SEL(inst##w, inst##l)
-
-#ifdef CONFIG_X86_32
-
-/*
- * The semaphore operations have a special calling sequence that
- * allow us to do a simpler in-line version of them. These routines
- * need to convert that sequence back into the C sequence when
- * there is contention on the semaphore.
- *
- * %eax contains the semaphore pointer on entry. Save the C-clobbered
- * registers (%eax, %edx and %ecx) except %eax which is either a return
- * value or just gets clobbered. Same is true for %edx so make sure GCC
- * reloads it after the slow path, by making it hold a temporary, for
- * example see ____down_write().
- */
-
-#define save_common_regs \
-	pushl %ecx
-
-#define restore_common_regs \
-	popl %ecx
-
-	/* Avoid uglifying the argument copying x86-64 needs to do. */
-	.macro movq src, dst
-	.endm
-
-#else
-
-/*
- * x86-64 rwsem wrappers
- *
- * This interfaces the inline asm code to the slow-path
- * C routines. We need to save the call-clobbered regs
- * that the asm does not mark as clobbered, and move the
- * argument from %rax to %rdi.
- *
- * NOTE! We don't need to save %rax, because the functions
- * will always return the semaphore pointer in %rax (which
- * is also the input argument to these helpers)
- *
- * The following can clobber %rdx because the asm clobbers it:
- *   call_rwsem_down_write_failed
- *   call_rwsem_wake
- * but %rdi, %rsi, %rcx, %r8-r11 always need saving.
- */
-
-#define save_common_regs \
-	pushq %rdi; \
-	pushq %rsi; \
-	pushq %rcx; \
-	pushq %r8;  \
-	pushq %r9;  \
-	pushq %r10; \
-	pushq %r11
-
-#define restore_common_regs \
-	popq %r11; \
-	popq %r10; \
-	popq %r9; \
-	popq %r8; \
-	popq %rcx; \
-	popq %rsi; \
-	popq %rdi
-
-#endif
-
-/* Fix up special calling conventions */
-ENTRY(call_rwsem_down_read_failed)
-	FRAME_BEGIN
-	save_common_regs
-	__ASM_SIZE(push,) %__ASM_REG(dx)
-	movq %rax,%rdi
-	call rwsem_down_read_failed
-	__ASM_SIZE(pop,) %__ASM_REG(dx)
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_read_failed)
-
-ENTRY(call_rwsem_down_write_failed)
-	FRAME_BEGIN
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_down_write_failed
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_write_failed)
-
-ENTRY(call_rwsem_down_write_failed_killable)
-	FRAME_BEGIN
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_down_write_failed_killable
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_write_failed_killable)
-
-ENTRY(call_rwsem_wake)
-	FRAME_BEGIN
-	/* do nothing if still outstanding active readers */
-	__ASM_HALF_SIZE(dec) %__ASM_HALF_REG(dx)
-	jnz 1f
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_wake
-	restore_common_regs
-1:	FRAME_END
-	ret
-ENDPROC(call_rwsem_wake)
-
-ENTRY(call_rwsem_downgrade_wake)
-	FRAME_BEGIN
-	save_common_regs
-	__ASM_SIZE(push,) %__ASM_REG(dx)
-	movq %rax,%rdi
-	call rwsem_downgrade_wake
-	__ASM_SIZE(pop,) %__ASM_REG(dx)
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_downgrade_wake)

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [RFC 10/12] x86, rwsem: simplify __down_write
@ 2016-06-03 16:13       ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2016-06-03 16:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michal Hocko, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko,
	Linus Torvalds, Paul E. McKenney

On Wed, Feb 03, 2016 at 09:10:16AM +0100, Ingo Molnar wrote:
> 
> * Michal Hocko <mhocko@kernel.org> wrote:
> 
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > x86 implementation of __down_write is using inline asm to optimize the
> > code flow. This however requires that it has go over an additional hop
> > for the slow path call_rwsem_down_write_failed which has to
> > save_common_regs/restore_common_regs to preserve the calling convention.
> > This, however doesn't add much because the fast path only saves one
> > register push/pop (rdx) when compared to the generic implementation:
> > 
> > Before:
> > 0000000000000019 <down_write>:
> >   19:   e8 00 00 00 00          callq  1e <down_write+0x5>
> >   1e:   55                      push   %rbp
> >   1f:   48 ba 01 00 00 00 ff    movabs $0xffffffff00000001,%rdx
> >   26:   ff ff ff
> >   29:   48 89 f8                mov    %rdi,%rax
> >   2c:   48 89 e5                mov    %rsp,%rbp
> >   2f:   f0 48 0f c1 10          lock xadd %rdx,(%rax)
> >   34:   85 d2                   test   %edx,%edx
> >   36:   74 05                   je     3d <down_write+0x24>
> >   38:   e8 00 00 00 00          callq  3d <down_write+0x24>
> >   3d:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
> >   44:   00 00
> >   46:   5d                      pop    %rbp
> >   47:   48 89 47 38             mov    %rax,0x38(%rdi)
> >   4b:   c3                      retq
> > 
> > After:
> > 0000000000000019 <down_write>:
> >   19:   e8 00 00 00 00          callq  1e <down_write+0x5>
> >   1e:   55                      push   %rbp
> >   1f:   48 b8 01 00 00 00 ff    movabs $0xffffffff00000001,%rax
> >   26:   ff ff ff
> >   29:   48 89 e5                mov    %rsp,%rbp
> >   2c:   53                      push   %rbx
> >   2d:   48 89 fb                mov    %rdi,%rbx
> >   30:   f0 48 0f c1 07          lock xadd %rax,(%rdi)
> >   35:   48 85 c0                test   %rax,%rax
> >   38:   74 05                   je     3f <down_write+0x26>
> >   3a:   e8 00 00 00 00          callq  3f <down_write+0x26>
> >   3f:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
> >   46:   00 00
> >   48:   48 89 43 38             mov    %rax,0x38(%rbx)
> >   4c:   5b                      pop    %rbx
> >   4d:   5d                      pop    %rbp
> >   4e:   c3                      retq
> 
> I'm not convinced about the removal of this optimization at all.

So I've been playing with this again because Jason's atomic_long_t
patches made a mess of things.

(similar findings for both ia64 and s390, suggesting killing all
arch/*/include/asm/rwsem.h might actuyally be an option).

Here's what I get (gcc-5.3.0):

# size x86_64-defconfig/vmlinux.{orig,mod}
   text    data     bss     dec     hex filename
10193229        4340512 1105920 15639661         eea46d x86_64-defconfig/vmlinux.orig
10193101        4340512 1105920 15639533         eea3ed x86_64-defconfig/vmlinux.mod

# objdump -D x86_64-defconfig/vmlinux.orig | awk '/<[^>]*>:$/ { p=0; } /<down_write>:/ { p=1; } { if (p) print $0; }'
ffffffff818d0d80 <down_write>:
ffffffff818d0d80:       55                      push   %rbp
ffffffff818d0d81:       48 89 e5                mov    %rsp,%rbp
ffffffff818d0d84:       53                      push   %rbx
ffffffff818d0d85:       48 89 fb                mov    %rdi,%rbx
ffffffff818d0d88:       e8 33 e3 ff ff          callq  ffffffff818cf0c0 <_cond_resched>
ffffffff818d0d8d:       48 ba 01 00 00 00 ff    movabs $0xffffffff00000001,%rdx
ffffffff818d0d94:       ff ff ff 
ffffffff818d0d97:       48 89 d8                mov    %rbx,%rax
ffffffff818d0d9a:       f0 48 0f c1 10          lock xadd %rdx,(%rax)
ffffffff818d0d9f:       85 d2                   test   %edx,%edx
ffffffff818d0da1:       74 05                   je     ffffffff818d0da8 <down_write+0x28>
ffffffff818d0da3:       e8 b8 d0 a5 ff          callq  ffffffff8132de60 <call_rwsem_down_write_failed>
ffffffff818d0da8:       65 48 8b 04 25 00 c4    mov    %gs:0xc400,%rax
ffffffff818d0daf:       00 00 
ffffffff818d0db1:       48 89 43 20             mov    %rax,0x20(%rbx)
ffffffff818d0db5:       5b                      pop    %rbx
ffffffff818d0db6:       5d                      pop    %rbp
ffffffff818d0db7:       c3                      retq   
ffffffff818d0db8:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
ffffffff818d0dbf:       00 

^C
# objdump -D x86_64-defconfig/vmlinux.mod | awk '/<[^>]*>:$/ { p=0; } /<down_write>:/ { p=1; } { if (p) print $0; }'
ffffffff818d0cf0 <down_write>:
ffffffff818d0cf0:       55                      push   %rbp
ffffffff818d0cf1:       48 89 e5                mov    %rsp,%rbp
ffffffff818d0cf4:       53                      push   %rbx
ffffffff818d0cf5:       48 89 fb                mov    %rdi,%rbx
ffffffff818d0cf8:       e8 23 e3 ff ff          callq  ffffffff818cf020 <_cond_resched>
ffffffff818d0cfd:       48 b8 01 00 00 00 ff    movabs $0xffffffff00000001,%rax
ffffffff818d0d04:       ff ff ff 
ffffffff818d0d07:       f0 48 0f c1 03          lock xadd %rax,(%rbx)
ffffffff818d0d0c:       48 85 c0                test   %rax,%rax
ffffffff818d0d0f:       75 10                   jne    ffffffff818d0d21 <down_write+0x31>
ffffffff818d0d11:       65 48 8b 04 25 00 c4    mov    %gs:0xc400,%rax
ffffffff818d0d18:       00 00 
ffffffff818d0d1a:       48 89 43 20             mov    %rax,0x20(%rbx)
ffffffff818d0d1e:       5b                      pop    %rbx
ffffffff818d0d1f:       5d                      pop    %rbp
ffffffff818d0d20:       c3                      retq   
ffffffff818d0d21:       48 89 df                mov    %rbx,%rdi
ffffffff818d0d24:       e8 f7 06 00 00          callq  ffffffff818d1420 <rwsem_down_write_failed>
ffffffff818d0d29:       eb e6                   jmp    ffffffff818d0d11 <down_write+0x21>
ffffffff818d0d2b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

---

diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index aeac434..1262556 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -10,3 +10,4 @@ generic-y += dma-contiguous.h
 generic-y += early_ioremap.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += rwsem.h
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
deleted file mode 100644
index 089ced4..0000000
--- a/arch/x86/include/asm/rwsem.h
+++ /dev/null
@@ -1,217 +0,0 @@
-/* rwsem.h: R/W semaphores implemented using XADD/CMPXCHG for i486+
- *
- * Written by David Howells (dhowells@redhat.com).
- *
- * Derived from asm-x86/semaphore.h
- *
- *
- * The MSW of the count is the negated number of active writers and waiting
- * lockers, and the LSW is the total number of active locks
- *
- * The lock count is initialized to 0 (no active and no waiting lockers).
- *
- * When a writer subtracts WRITE_BIAS, it'll get 0xffff0001 for the case of an
- * uncontended lock. This can be determined because XADD returns the old value.
- * Readers increment by 1 and see a positive value when uncontended, negative
- * if there are writers (and maybe) readers waiting (in which case it goes to
- * sleep).
- *
- * The value of WAITING_BIAS supports up to 32766 waiting processes. This can
- * be extended to 65534 by manually checking the whole MSW rather than relying
- * on the S flag.
- *
- * The value of ACTIVE_BIAS supports up to 65535 active processes.
- *
- * This should be totally fair - if anything is waiting, a process that wants a
- * lock will go to the back of the queue. When the currently active lock is
- * released, if there's a writer at the front of the queue, then that and only
- * that will be woken up; if there's a bunch of consecutive readers at the
- * front, then they'll all be woken up, but no other readers will be.
- */
-
-#ifndef _ASM_X86_RWSEM_H
-#define _ASM_X86_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-#include <asm/asm.h>
-
-/*
- * The bias values and the counter type limits the number of
- * potential readers/writers to 32767 for 32 bits and 2147483647
- * for 64 bits.
- */
-
-#ifdef CONFIG_X86_64
-# define RWSEM_ACTIVE_MASK		0xffffffffL
-#else
-# define RWSEM_ACTIVE_MASK		0x0000ffffL
-#endif
-
-#define RWSEM_UNLOCKED_VALUE		0x00000000L
-#define RWSEM_ACTIVE_BIAS		0x00000001L
-#define RWSEM_WAITING_BIAS		(-RWSEM_ACTIVE_MASK-1)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	asm volatile("# beginning down_read\n\t"
-		     LOCK_PREFIX _ASM_INC "(%1)\n\t"
-		     /* adds 0x00000001 */
-		     "  jns        1f\n"
-		     "  call call_rwsem_down_read_failed\n"
-		     "1:\n\t"
-		     "# ending down_read\n\t"
-		     : "+m" (sem->count)
-		     : "a" (sem)
-		     : "memory", "cc");
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
-	long result, tmp;
-	asm volatile("# beginning __down_read_trylock\n\t"
-		     "  mov          %0,%1\n\t"
-		     "1:\n\t"
-		     "  mov          %1,%2\n\t"
-		     "  add          %3,%2\n\t"
-		     "  jle	     2f\n\t"
-		     LOCK_PREFIX "  cmpxchg  %2,%0\n\t"
-		     "  jnz	     1b\n\t"
-		     "2:\n\t"
-		     "# ending __down_read_trylock\n\t"
-		     : "+m" (sem->count), "=&a" (result), "=&r" (tmp)
-		     : "i" (RWSEM_ACTIVE_READ_BIAS)
-		     : "memory", "cc");
-	return result >= 0 ? 1 : 0;
-}
-
-/*
- * lock for writing
- */
-#define ____down_write(sem, slow_path)			\
-({							\
-	long tmp;					\
-	struct rw_semaphore* ret;			\
-	asm volatile("# beginning down_write\n\t"	\
-		     LOCK_PREFIX "  xadd      %1,(%3)\n\t"	\
-		     /* adds 0xffff0001, returns the old value */ \
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
-		     /* was the active mask 0 before? */\
-		     "  jz        1f\n"			\
-		     "  call " slow_path "\n"		\
-		     "1:\n"				\
-		     "# ending down_write"		\
-		     : "+m" (sem->count), "=d" (tmp), "=a" (ret)	\
-		     : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
-		     : "memory", "cc");			\
-	ret;						\
-})
-
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	____down_write(sem, "call_rwsem_down_write_failed");
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	if (IS_ERR(____down_write(sem, "call_rwsem_down_write_failed_killable")))
-		return -EINTR;
-
-	return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
-	long result, tmp;
-	asm volatile("# beginning __down_write_trylock\n\t"
-		     "  mov          %0,%1\n\t"
-		     "1:\n\t"
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t"
-		     /* was the active mask 0 before? */
-		     "  jnz          2f\n\t"
-		     "  mov          %1,%2\n\t"
-		     "  add          %3,%2\n\t"
-		     LOCK_PREFIX "  cmpxchg  %2,%0\n\t"
-		     "  jnz	     1b\n\t"
-		     "2:\n\t"
-		     "  sete         %b1\n\t"
-		     "  movzbl       %b1, %k1\n\t"
-		     "# ending __down_write_trylock\n\t"
-		     : "+m" (sem->count), "=&a" (result), "=&r" (tmp)
-		     : "er" (RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
-	return result;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	long tmp;
-	asm volatile("# beginning __up_read\n\t"
-		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
-		     /* subtracts 1, returns the old value */
-		     "  jns        1f\n\t"
-		     "  call call_rwsem_wake\n" /* expects old value in %edx */
-		     "1:\n"
-		     "# ending __up_read\n"
-		     : "+m" (sem->count), "=d" (tmp)
-		     : "a" (sem), "1" (-RWSEM_ACTIVE_READ_BIAS)
-		     : "memory", "cc");
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	long tmp;
-	asm volatile("# beginning __up_write\n\t"
-		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
-		     /* subtracts 0xffff0001, returns the old value */
-		     "  jns        1f\n\t"
-		     "  call call_rwsem_wake\n" /* expects old value in %edx */
-		     "1:\n\t"
-		     "# ending __up_write\n"
-		     : "+m" (sem->count), "=d" (tmp)
-		     : "a" (sem), "1" (-RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	asm volatile("# beginning __downgrade_write\n\t"
-		     LOCK_PREFIX _ASM_ADD "%2,(%1)\n\t"
-		     /*
-		      * transitions 0xZZZZ0001 -> 0xYYYY0001 (i386)
-		      *     0xZZZZZZZZ00000001 -> 0xYYYYYYYY00000001 (x86_64)
-		      */
-		     "  jns       1f\n\t"
-		     "  call call_rwsem_downgrade_wake\n"
-		     "1:\n\t"
-		     "# ending __downgrade_write\n"
-		     : "+m" (sem->count)
-		     : "a" (sem), "er" (-RWSEM_WAITING_BIAS)
-		     : "memory", "cc");
-}
-
-#endif /* __KERNEL__ */
-#endif /* _ASM_X86_RWSEM_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 72a5767..c47dd5f 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -22,7 +22,6 @@ obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o
 lib-y := delay.o misc.o cmdline.o cpu.o
 lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
-lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
deleted file mode 100644
index bf2c607..0000000
--- a/arch/x86/lib/rwsem.S
+++ /dev/null
@@ -1,144 +0,0 @@
-/*
- * x86 semaphore implementation.
- *
- * (C) Copyright 1999 Linus Torvalds
- *
- * Portions Copyright 1999 Red Hat, Inc.
- *
- *	This program is free software; you can redistribute it and/or
- *	modify it under the terms of the GNU General Public License
- *	as published by the Free Software Foundation; either version
- *	2 of the License, or (at your option) any later version.
- *
- * rw semaphores implemented November 1999 by Benjamin LaHaise <bcrl@kvack.org>
- */
-
-#include <linux/linkage.h>
-#include <asm/alternative-asm.h>
-#include <asm/frame.h>
-
-#define __ASM_HALF_REG(reg)	__ASM_SEL(reg, e##reg)
-#define __ASM_HALF_SIZE(inst)	__ASM_SEL(inst##w, inst##l)
-
-#ifdef CONFIG_X86_32
-
-/*
- * The semaphore operations have a special calling sequence that
- * allow us to do a simpler in-line version of them. These routines
- * need to convert that sequence back into the C sequence when
- * there is contention on the semaphore.
- *
- * %eax contains the semaphore pointer on entry. Save the C-clobbered
- * registers (%eax, %edx and %ecx) except %eax which is either a return
- * value or just gets clobbered. Same is true for %edx so make sure GCC
- * reloads it after the slow path, by making it hold a temporary, for
- * example see ____down_write().
- */
-
-#define save_common_regs \
-	pushl %ecx
-
-#define restore_common_regs \
-	popl %ecx
-
-	/* Avoid uglifying the argument copying x86-64 needs to do. */
-	.macro movq src, dst
-	.endm
-
-#else
-
-/*
- * x86-64 rwsem wrappers
- *
- * This interfaces the inline asm code to the slow-path
- * C routines. We need to save the call-clobbered regs
- * that the asm does not mark as clobbered, and move the
- * argument from %rax to %rdi.
- *
- * NOTE! We don't need to save %rax, because the functions
- * will always return the semaphore pointer in %rax (which
- * is also the input argument to these helpers)
- *
- * The following can clobber %rdx because the asm clobbers it:
- *   call_rwsem_down_write_failed
- *   call_rwsem_wake
- * but %rdi, %rsi, %rcx, %r8-r11 always need saving.
- */
-
-#define save_common_regs \
-	pushq %rdi; \
-	pushq %rsi; \
-	pushq %rcx; \
-	pushq %r8;  \
-	pushq %r9;  \
-	pushq %r10; \
-	pushq %r11
-
-#define restore_common_regs \
-	popq %r11; \
-	popq %r10; \
-	popq %r9; \
-	popq %r8; \
-	popq %rcx; \
-	popq %rsi; \
-	popq %rdi
-
-#endif
-
-/* Fix up special calling conventions */
-ENTRY(call_rwsem_down_read_failed)
-	FRAME_BEGIN
-	save_common_regs
-	__ASM_SIZE(push,) %__ASM_REG(dx)
-	movq %rax,%rdi
-	call rwsem_down_read_failed
-	__ASM_SIZE(pop,) %__ASM_REG(dx)
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_read_failed)
-
-ENTRY(call_rwsem_down_write_failed)
-	FRAME_BEGIN
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_down_write_failed
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_write_failed)
-
-ENTRY(call_rwsem_down_write_failed_killable)
-	FRAME_BEGIN
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_down_write_failed_killable
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_write_failed_killable)
-
-ENTRY(call_rwsem_wake)
-	FRAME_BEGIN
-	/* do nothing if still outstanding active readers */
-	__ASM_HALF_SIZE(dec) %__ASM_HALF_REG(dx)
-	jnz 1f
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_wake
-	restore_common_regs
-1:	FRAME_END
-	ret
-ENDPROC(call_rwsem_wake)
-
-ENTRY(call_rwsem_downgrade_wake)
-	FRAME_BEGIN
-	save_common_regs
-	__ASM_SIZE(push,) %__ASM_REG(dx)
-	movq %rax,%rdi
-	call rwsem_downgrade_wake
-	__ASM_SIZE(pop,) %__ASM_REG(dx)
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_downgrade_wake)

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [RFC 10/12] x86, rwsem: simplify __down_write
  2016-06-03 16:13       ` Peter Zijlstra
@ 2016-06-03 22:34         ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2016-06-03 22:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michal Hocko, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko,
	Linus Torvalds, Paul E. McKenney, Jason Low

On Fri, Jun 03, 2016 at 06:13:39PM +0200, Peter Zijlstra wrote:
> So I've been playing with this again because Jason's atomic_long_t
> patches made a mess of things.
> 
> (similar findings for both ia64 and s390, suggesting killing all
> arch/*/include/asm/rwsem.h might actuyally be an option).
> 

Blergh; so looking at more asm there's still a few tricks we cannot do.
So while overall size is down, some paths do end up more expensive. (It
typically boils down to creative use of condition flags, which is very
hard in C)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 10/12] x86, rwsem: simplify __down_write
@ 2016-06-03 22:34         ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2016-06-03 22:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michal Hocko, LKML, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	David S. Miller, Tony Luck, Andrew Morton, Chris Zankel,
	Max Filippov, x86, linux-alpha, linux-ia64, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, Michal Hocko,
	Linus Torvalds, Paul E. McKenney, Jason Low

On Fri, Jun 03, 2016 at 06:13:39PM +0200, Peter Zijlstra wrote:
> So I've been playing with this again because Jason's atomic_long_t
> patches made a mess of things.
> 
> (similar findings for both ia64 and s390, suggesting killing all
> arch/*/include/asm/rwsem.h might actuyally be an option).
> 

Blergh; so looking at more asm there's still a few tricks we cannot do.
So while overall size is down, some paths do end up more expensive. (It
typically boils down to creative use of condition flags, which is very
hard in C)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 10/12] x86, rwsem: simplify __down_write
  2016-06-03 16:13       ` Peter Zijlstra
@ 2016-06-09 14:40         ` David Howells
  -1 siblings, 0 replies; 64+ messages in thread
From: David Howells @ 2016-06-09 14:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, Ingo Molnar, Michal Hocko, LKML, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, David S. Miller, Tony Luck,
	Andrew Morton, Chris Zankel, Max Filippov, x86, linux-alpha,
	linux-ia64, linux-s390, linux-sh, sparclinux, linux-xtensa,
	linux-arch, Michal Hocko, Linus Torvalds, Paul E. McKenney,
	Jason Low

Peter Zijlstra <peterz@infradead.org> wrote:

> Blergh; so looking at more asm there's still a few tricks we cannot do.
> So while overall size is down, some paths do end up more expensive. (It
> typically boils down to creative use of condition flags, which is very
> hard in C)

It can be done using ISO __atomic_fetch_add() and suchlike.

David

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 10/12] x86, rwsem: simplify __down_write
@ 2016-06-09 14:40         ` David Howells
  0 siblings, 0 replies; 64+ messages in thread
From: David Howells @ 2016-06-09 14:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, Ingo Molnar, Michal Hocko, LKML, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, David S. Miller, Tony Luck,
	Andrew Morton, Chris Zankel, Max Filippov, x86, linux-alpha,
	linux-ia64, linux-s390, linux-sh, sparclinux, linux-xtensa,
	linux-arch, Michal Hocko, Linus Torvalds, Paul E. McKenney,
	Jason Low

Peter Zijlstra <peterz@infradead.org> wrote:

> Blergh; so looking at more asm there's still a few tricks we cannot do.
> So while overall size is down, some paths do end up more expensive. (It
> typically boils down to creative use of condition flags, which is very
> hard in C)

It can be done using ISO __atomic_fetch_add() and suchlike.

David

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 10/12] x86, rwsem: simplify __down_write
  2016-06-09 14:40         ` David Howells
@ 2016-06-09 17:36           ` Peter Zijlstra
  -1 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2016-06-09 17:36 UTC (permalink / raw)
  To: David Howells
  Cc: Ingo Molnar, Michal Hocko, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Michal Hocko, Linus Torvalds, Paul E. McKenney, Jason Low

On Thu, Jun 09, 2016 at 03:40:58PM +0100, David Howells wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Blergh; so looking at more asm there's still a few tricks we cannot do.
> > So while overall size is down, some paths do end up more expensive. (It
> > typically boils down to creative use of condition flags, which is very
> > hard in C)
> 
> It can be done using ISO __atomic_fetch_add() and suchlike.

(ISO-C11, ISO as such is a bad abbreviation I think)

Maybe, but we're almost there with __GCC_ASM_FLAG_OUTPUTS__.

atomic_long_add_negative() can be made to do inc;j(n)s for __down_read.

the try_cmpxchg family you wanted to add independent from the ISO-C11
bits can do the cmpxchg-j(n)z for __down_{read,write}_trylock.

That only leaves us wanting an atomic_long_fetch_add_negative() for
__up_{read,write}().

Although I suppose, for this to be of use for our weakly ordered
friends, we need _relaxed versions of all that (so that _acquire and
_release variants are generated).



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 10/12] x86, rwsem: simplify __down_write
@ 2016-06-09 17:36           ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2016-06-09 17:36 UTC (permalink / raw)
  To: David Howells
  Cc: Ingo Molnar, Michal Hocko, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, David S. Miller, Tony Luck, Andrew Morton,
	Chris Zankel, Max Filippov, x86, linux-alpha, linux-ia64,
	linux-s390, linux-sh, sparclinux, linux-xtensa, linux-arch,
	Michal Hocko, Linus Torvalds, Paul E. McKenney, Jason Low

On Thu, Jun 09, 2016 at 03:40:58PM +0100, David Howells wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Blergh; so looking at more asm there's still a few tricks we cannot do.
> > So while overall size is down, some paths do end up more expensive. (It
> > typically boils down to creative use of condition flags, which is very
> > hard in C)
> 
> It can be done using ISO __atomic_fetch_add() and suchlike.

(ISO-C11, ISO as such is a bad abbreviation I think)

Maybe, but we're almost there with __GCC_ASM_FLAG_OUTPUTS__.

atomic_long_add_negative() can be made to do inc;j(n)s for __down_read.

the try_cmpxchg family you wanted to add independent from the ISO-C11
bits can do the cmpxchg-j(n)z for __down_{read,write}_trylock.

That only leaves us wanting an atomic_long_fetch_add_negative() for
__up_{read,write}().

Although I suppose, for this to be of use for our weakly ordered
friends, we need _relaxed versions of all that (so that _acquire and
_release variants are generated).

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 10/12] x86, rwsem: simplify __down_write
  2016-06-09 17:36           ` Peter Zijlstra
@ 2016-06-10 16:39             ` Paul E. McKenney
  -1 siblings, 0 replies; 64+ messages in thread
From: Paul E. McKenney @ 2016-06-10 16:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Howells, Ingo Molnar, Michal Hocko, LKML, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, David S. Miller, Tony Luck,
	Andrew Morton, Chris Zankel, Max Filippov, x86, linux-alpha,
	linux-ia64, linux-s390, linux-sh, sparclinux, linux-xtensa,
	linux-arch, Michal Hocko, Linus Torvalds, Jason Low

On Thu, Jun 09, 2016 at 07:36:40PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 09, 2016 at 03:40:58PM +0100, David Howells wrote:
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > Blergh; so looking at more asm there's still a few tricks we cannot do.
> > > So while overall size is down, some paths do end up more expensive. (It
> > > typically boils down to creative use of condition flags, which is very
> > > hard in C)
> > 
> > It can be done using ISO __atomic_fetch_add() and suchlike.
> 
> (ISO-C11, ISO as such is a bad abbreviation I think)
> 
> Maybe, but we're almost there with __GCC_ASM_FLAG_OUTPUTS__.
> 
> atomic_long_add_negative() can be made to do inc;j(n)s for __down_read.
> 
> the try_cmpxchg family you wanted to add independent from the ISO-C11
> bits can do the cmpxchg-j(n)z for __down_{read,write}_trylock.
> 
> That only leaves us wanting an atomic_long_fetch_add_negative() for
> __up_{read,write}().
> 
> Although I suppose, for this to be of use for our weakly ordered
> friends, we need _relaxed versions of all that (so that _acquire and
> _release variants are generated).

Historically, the compilers have won this sort of contest over the
long term.  That said, there is nothing quite like raising the bar for
them to help them generate decent code.  So, David and Peter, I am behind
both of you 100%.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC 10/12] x86, rwsem: simplify __down_write
@ 2016-06-10 16:39             ` Paul E. McKenney
  0 siblings, 0 replies; 64+ messages in thread
From: Paul E. McKenney @ 2016-06-10 16:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Howells, Ingo Molnar, Michal Hocko, LKML, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, David S. Miller, Tony Luck,
	Andrew Morton, Chris Zankel, Max Filippov, x86, linux-alpha,
	linux-ia64, linux-s390, linux-sh, sparclinux, linux-xtensa,
	linux-arch, Michal Hocko, Linus Torvalds, Jason Low

On Thu, Jun 09, 2016 at 07:36:40PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 09, 2016 at 03:40:58PM +0100, David Howells wrote:
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > Blergh; so looking at more asm there's still a few tricks we cannot do.
> > > So while overall size is down, some paths do end up more expensive. (It
> > > typically boils down to creative use of condition flags, which is very
> > > hard in C)
> > 
> > It can be done using ISO __atomic_fetch_add() and suchlike.
> 
> (ISO-C11, ISO as such is a bad abbreviation I think)
> 
> Maybe, but we're almost there with __GCC_ASM_FLAG_OUTPUTS__.
> 
> atomic_long_add_negative() can be made to do inc;j(n)s for __down_read.
> 
> the try_cmpxchg family you wanted to add independent from the ISO-C11
> bits can do the cmpxchg-j(n)z for __down_{read,write}_trylock.
> 
> That only leaves us wanting an atomic_long_fetch_add_negative() for
> __up_{read,write}().
> 
> Although I suppose, for this to be of use for our weakly ordered
> friends, we need _relaxed versions of all that (so that _acquire and
> _release variants are generated).

Historically, the compilers have won this sort of contest over the
long term.  That said, there is nothing quite like raising the bar for
them to help them generate decent code.  So, David and Peter, I am behind
both of you 100%.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2016-06-10 17:59 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-02 20:19 [RFC 0/12] introduce down_write_killable for rw_semaphore Michal Hocko
2016-02-02 20:19 ` Michal Hocko
2016-02-02 20:19 ` [RFC 01/12] locking, rwsem: get rid of __down_write_nested Michal Hocko
2016-02-02 20:19   ` Michal Hocko
2016-02-02 20:19 ` [RFC 02/12] locking, rwsem: drop explicit memory barriers Michal Hocko
2016-02-02 20:19   ` Michal Hocko
2016-02-02 20:19 ` [RFC 03/12] locking, rwsem: introduce basis for down_write_killable Michal Hocko
2016-02-02 20:19   ` Michal Hocko
2016-02-02 20:19 ` [RFC 04/12] alpha, rwsem: provide __down_write_killable Michal Hocko
2016-02-02 20:19   ` Michal Hocko
2016-02-02 20:19 ` [RFC 05/12] ia64, " Michal Hocko
2016-02-02 20:19   ` Michal Hocko
2016-02-02 20:19 ` [RFC 06/12] s390, " Michal Hocko
2016-02-02 20:19   ` Michal Hocko
2016-02-02 20:19 ` [RFC 07/12] sh, " Michal Hocko
2016-02-02 20:19   ` Michal Hocko
2016-02-03 11:19   ` Sergei Shtylyov
2016-02-03 11:19     ` Sergei Shtylyov
2016-02-03 12:11     ` Michal Hocko
2016-02-03 12:11       ` Michal Hocko
2016-02-02 20:19 ` [RFC 08/12] sparc, " Michal Hocko
2016-02-02 20:19   ` Michal Hocko
2016-02-02 20:19 ` [RFC 09/12] xtensa, " Michal Hocko
2016-02-02 20:19   ` Michal Hocko
2016-02-02 20:19 ` [RFC 10/12] x86, rwsem: simplify __down_write Michal Hocko
2016-02-02 20:19   ` Michal Hocko
2016-02-03  8:10   ` Ingo Molnar
2016-02-03  8:10     ` Ingo Molnar
2016-02-03 12:10     ` Michal Hocko
2016-02-03 12:10       ` Michal Hocko
2016-06-03 16:13     ` Peter Zijlstra
2016-06-03 16:13       ` Peter Zijlstra
2016-06-03 22:34       ` Peter Zijlstra
2016-06-03 22:34         ` Peter Zijlstra
2016-06-09 14:40       ` David Howells
2016-06-09 14:40         ` David Howells
2016-06-09 17:36         ` Peter Zijlstra
2016-06-09 17:36           ` Peter Zijlstra
2016-06-10 16:39           ` Paul E. McKenney
2016-06-10 16:39             ` Paul E. McKenney
2016-02-02 20:19 ` [RFC 11/12] x86, rwsem: provide __down_write_killable Michal Hocko
2016-02-02 20:19   ` Michal Hocko
2016-02-17 16:41   ` [RFC 11/12 v1] " Michal Hocko
2016-02-17 16:41     ` Michal Hocko
2016-02-17 16:41     ` Michal Hocko
2016-02-17 16:41     ` Michal Hocko
2016-02-02 20:19 ` [RFC 12/12] locking, rwsem: provide down_write_killable Michal Hocko
2016-02-02 20:19   ` Michal Hocko
2016-02-19 12:15 ` [RFC 0/12] introduce down_write_killable for rw_semaphore Michal Hocko
2016-02-19 12:15   ` Michal Hocko
2016-03-09 12:18 ` Ingo Molnar
2016-03-09 12:18   ` Ingo Molnar
2016-03-09 12:56   ` Michal Hocko
2016-03-09 12:56     ` Michal Hocko
2016-03-09 13:17     ` Ingo Molnar
2016-03-09 13:17       ` Ingo Molnar
2016-03-09 13:28       ` Michal Hocko
2016-03-09 13:28         ` Michal Hocko
2016-03-09 13:43         ` Ingo Molnar
2016-03-09 13:43           ` Ingo Molnar
2016-03-09 14:41           ` Michal Hocko
2016-03-09 14:41             ` Michal Hocko
2016-03-10 10:24             ` Ingo Molnar
2016-03-10 10:24               ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.