linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/2] locking/rwsem: Fix DEBUG_RWSEM warning from thaw_sup
@ 2018-05-14 19:31 Waiman Long
  2018-05-14 19:31 ` [RFC PATCH v2 1/2] locking/rwsem: Add a new RWSEM_WRITER_OWNED_NOSPIN flag Waiman Long
  2018-05-14 19:31 ` [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release() Waiman Long
  0 siblings, 2 replies; 18+ messages in thread
From: Waiman Long @ 2018-05-14 19:31 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel, linux-fsdevel, Davidlohr Bueso,
	Theodore Y. Ts'o, Oleg Nesterov, Amir Goldstein, Jan Kara,
	Waiman Long

My original patch (https://lkml.org/lkml/2018/4/4/447) to fix this isse 
probably won't work. This is my second attempt to fix it.

I don't have the setup to reproduce the problem. Could someone try it to see
if it can eliminate the warning?

Waiman Long (2):
  locking/rwsem: Add a new RWSEM_WRITER_OWNED_NOSPIN flag
  locking/percpu-rwsem: Mark rwsem as non-spinnable in
    percpu_rwsem_release()

 include/linux/percpu-rwsem.h |  6 +++---
 include/linux/rwsem.h        | 10 ++++++++++
 kernel/locking/rwsem-xadd.c  | 17 ++++++++---------
 kernel/locking/rwsem.c       | 16 +++++++++++++++-
 kernel/locking/rwsem.h       | 37 ++++++++++++++++++++++++++++++-------
 5 files changed, 66 insertions(+), 20 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 1/2] locking/rwsem: Add a new RWSEM_WRITER_OWNED_NOSPIN flag
  2018-05-14 19:31 [RFC PATCH v2 0/2] locking/rwsem: Fix DEBUG_RWSEM warning from thaw_sup Waiman Long
@ 2018-05-14 19:31 ` Waiman Long
  2018-05-15  6:59   ` Amir Goldstein
  2018-05-15  8:25   ` Peter Zijlstra
  2018-05-14 19:31 ` [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release() Waiman Long
  1 sibling, 2 replies; 18+ messages in thread
From: Waiman Long @ 2018-05-14 19:31 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel, linux-fsdevel, Davidlohr Bueso,
	Theodore Y. Ts'o, Oleg Nesterov, Amir Goldstein, Jan Kara,
	Waiman Long

There are use cases where a rwsem can be acquired by one task, but
released by another task. In thess cases, it may not be appropriate
for the lock waiters to spin on the task that acquires the lock.
One example will be the filesystem freeze/thaw code.

To handle such use cases, a new RWSEM_WRITER_OWNED_NOSPIN
flag can now be set in the owner field of the rwsem by the new
rwsem_set_writer_owned_nospin() function to indicate that the rwsem is
writer owned, but optimistic spinning on the rwsem should be disabled.

Later on, the new rwsem_set_writer_owned() function can be called to
set the new owner, if it is known. This function should not be called
without a prior rwsem_set_writer_owned_nospin() call.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/rwsem.h       | 10 ++++++++++
 kernel/locking/rwsem-xadd.c | 17 ++++++++---------
 kernel/locking/rwsem.c      | 16 +++++++++++++++-
 kernel/locking/rwsem.h      | 37 ++++++++++++++++++++++++++++++-------
 4 files changed, 63 insertions(+), 17 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 56707d5..1ddf24b 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -145,6 +145,16 @@ static inline int rwsem_is_contended(struct rw_semaphore *sem)
  */
 extern void downgrade_write(struct rw_semaphore *sem);
 
+#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+extern void rwsem_set_writer_owned_nospin(struct rw_semaphore *sem);
+extern void rwsem_set_writer_owned(struct rw_semaphore *sem,
+				   struct task_struct *task);
+#else
+static inline void rwsem_set_writer_owned_nospin(struct rw_semaphore *sem) { }
+extern inline void rwsem_set_writer_owned(struct rw_semaphore *sem,
+					  struct task_struct *task) { }
+#endif
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 /*
  * nested locking. NOTE: rwsems are not allowed to recurse
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index e795908..a27dbb4 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -357,11 +357,8 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 
 	rcu_read_lock();
 	owner = READ_ONCE(sem->owner);
-	if (!rwsem_owner_is_writer(owner)) {
-		/*
-		 * Don't spin if the rwsem is readers owned.
-		 */
-		ret = !rwsem_owner_is_reader(owner);
+	if (!owner || !is_rwsem_owner_spinnable(owner)) {
+		ret = !owner;	/* !owner is spinnable */
 		goto done;
 	}
 
@@ -382,8 +379,10 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *owner = READ_ONCE(sem->owner);
 
-	if (!rwsem_owner_is_writer(owner))
-		goto out;
+	if (!owner)
+		return true;
+	else if (!is_rwsem_owner_spinnable(owner))
+		return false;
 
 	rcu_read_lock();
 	while (sem->owner == owner) {
@@ -408,12 +407,12 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 		cpu_relax();
 	}
 	rcu_read_unlock();
-out:
+
 	/*
 	 * If there is a new owner or the owner is not set, we continue
 	 * spinning.
 	 */
-	return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
+	return is_rwsem_owner_spinnable(READ_ONCE(sem->owner));
 }
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 30465a2..90e89ee 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -130,7 +130,8 @@ void up_read(struct rw_semaphore *sem)
 void up_write(struct rw_semaphore *sem)
 {
 	rwsem_release(&sem->dep_map, 1, _RET_IP_);
-	DEBUG_RWSEMS_WARN_ON(sem->owner != current);
+	DEBUG_RWSEMS_WARN_ON((sem->owner != current) &&
+			     (sem->owner != RWSEM_WRITER_OWNED_NOSPIN));
 
 	rwsem_clear_owner(sem);
 	__up_write(sem);
@@ -222,4 +223,17 @@ void up_read_non_owner(struct rw_semaphore *sem)
 
 #endif
 
+#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+void rwsem_set_writer_owned_nospin(struct rw_semaphore *sem)
+{
+	__rwsem_set_writer_owned_nospin(sem);
+}
+EXPORT_SYMBOL(rwsem_set_writer_owned_nospin);
 
+void rwsem_set_writer_owned(struct rw_semaphore *sem, struct task_struct *task)
+{
+	DEBUG_RWSEMS_WARN_ON(sem->owner != RWSEM_WRITER_OWNED_NOSPIN);
+	__rwsem_set_writer_owned(sem, task);
+}
+EXPORT_SYMBOL(rwsem_set_writer_owned);
+#endif
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index a17cba8..bbbd5a3 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -11,10 +11,15 @@
  *  2) RWSEM_READER_OWNED
  *     - lock is currently or previously owned by readers (lock is free
  *       or not set by owner yet)
- *  3) Other non-zero value
- *     - a writer owns the lock
+ *  3) RWSEM_WRITER_OWNED_NOSPIN
+ *     - lock is owned by a writer whose lock ownership may be transfered to
+ *	 another task and so spinning on the lock owner should be disabled.
+ *  4) Other non-zero value
+ *     - a writer owns the lock and other writers can spin on the lock owner.
  */
-#define RWSEM_READER_OWNED	((struct task_struct *)1UL)
+#define RWSEM_READER_OWNED		((struct task_struct *)1UL)
+#define RWSEM_WRITER_OWNED_NOSPIN	((struct task_struct *)2UL)
+#define RWSEM_NOSPIN_MASK		3UL
 
 #ifdef CONFIG_DEBUG_RWSEMS
 # define DEBUG_RWSEMS_WARN_ON(c)	DEBUG_LOCKS_WARN_ON(c)
@@ -51,14 +56,32 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 		WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
 }
 
-static inline bool rwsem_owner_is_writer(struct task_struct *owner)
+/*
+ * Mark the rwsem as writer owned, but optimistic spinning should be
+ * disabled.
+ *
+ * The caller must make sure that the rwsem is really writer owned
+ * and the lock won't be freed concurrently with this call.
+ */
+static inline void __rwsem_set_writer_owned_nospin(struct rw_semaphore *sem)
+{
+	WRITE_ONCE(sem->owner, RWSEM_WRITER_OWNED_NOSPIN);
+}
+
+static inline void __rwsem_set_writer_owned(struct rw_semaphore *sem,
+					    struct task_struct *task)
 {
-	return owner && owner != RWSEM_READER_OWNED;
+	WRITE_ONCE(sem->owner, task);
 }
 
-static inline bool rwsem_owner_is_reader(struct task_struct *owner)
+/*
+ * Return true if the a rwsem waiter can spin on the rwsem's owner
+ * and steal the lock.
+ * N.B. !owner is considered spinnable.
+ */
+static inline bool is_rwsem_owner_spinnable(struct task_struct *owner)
 {
-	return owner == RWSEM_READER_OWNED;
+	return !((unsigned long)owner & RWSEM_NOSPIN_MASK);
 }
 #else
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release()
  2018-05-14 19:31 [RFC PATCH v2 0/2] locking/rwsem: Fix DEBUG_RWSEM warning from thaw_sup Waiman Long
  2018-05-14 19:31 ` [RFC PATCH v2 1/2] locking/rwsem: Add a new RWSEM_WRITER_OWNED_NOSPIN flag Waiman Long
@ 2018-05-14 19:31 ` Waiman Long
  2018-05-15  5:42   ` Amir Goldstein
                     ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Waiman Long @ 2018-05-14 19:31 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel, linux-fsdevel, Davidlohr Bueso,
	Theodore Y. Ts'o, Oleg Nesterov, Amir Goldstein, Jan Kara,
	Waiman Long

The percpu_rwsem_release() is called when the ownership of the embedded
rwsem is to be transferred to another task. The new owner, however, may
take a while to get the ownership of the lock via percpu_rwsem_acquire().
During that period, the rwsem is now marked as writer-owned with no
optimistic spinning.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/percpu-rwsem.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index b1f37a8..dd37102 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -131,16 +131,16 @@ static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem,
 					bool read, unsigned long ip)
 {
 	lock_release(&sem->rw_sem.dep_map, 1, ip);
-#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 	if (!read)
-		sem->rw_sem.owner = NULL;
-#endif
+		rwsem_set_writer_owned_nospin(&sem->rw_sem);
 }
 
 static inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem,
 					bool read, unsigned long ip)
 {
 	lock_acquire(&sem->rw_sem.dep_map, 0, 1, read, 1, NULL, ip);
+	if (!read)
+		rwsem_set_writer_owned(&sem->rw_sem, current);
 }
 
 #endif
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release()
  2018-05-14 19:31 ` [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release() Waiman Long
@ 2018-05-15  5:42   ` Amir Goldstein
  2018-05-15  7:04     ` Amir Goldstein
  2018-05-15 13:45     ` Waiman Long
  2018-05-15  8:35   ` Peter Zijlstra
  2018-05-15  8:51   ` Peter Zijlstra
  2 siblings, 2 replies; 18+ messages in thread
From: Amir Goldstein @ 2018-05-15  5:42 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel,
	linux-fsdevel, Davidlohr Bueso, Theodore Y. Ts'o,
	Oleg Nesterov, Jan Kara

On Mon, May 14, 2018 at 10:31 PM, Waiman Long <longman@redhat.com> wrote:
> The percpu_rwsem_release() is called when the ownership of the embedded
> rwsem is to be transferred to another task. The new owner, however, may
> take a while to get the ownership of the lock via percpu_rwsem_acquire().
> During that period, the rwsem is now marked as writer-owned with no
> optimistic spinning.
>

Waiman,

Thanks for the fix. I will test it soon.

For this commit message I suggest that you add parts of the reproducer
found here:
https://marc.info/?l=linux-fsdevel&m=152622016219975&w=2

Thanks,
Amir.

> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  include/linux/percpu-rwsem.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
> index b1f37a8..dd37102 100644
> --- a/include/linux/percpu-rwsem.h
> +++ b/include/linux/percpu-rwsem.h
> @@ -131,16 +131,16 @@ static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem,
>                                         bool read, unsigned long ip)
>  {
>         lock_release(&sem->rw_sem.dep_map, 1, ip);
> -#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
>         if (!read)
> -               sem->rw_sem.owner = NULL;
> -#endif
> +               rwsem_set_writer_owned_nospin(&sem->rw_sem);
>  }
>
>  static inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem,
>                                         bool read, unsigned long ip)
>  {
>         lock_acquire(&sem->rw_sem.dep_map, 0, 1, read, 1, NULL, ip);
> +       if (!read)
> +               rwsem_set_writer_owned(&sem->rw_sem, current);
>  }
>
>  #endif
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 1/2] locking/rwsem: Add a new RWSEM_WRITER_OWNED_NOSPIN flag
  2018-05-14 19:31 ` [RFC PATCH v2 1/2] locking/rwsem: Add a new RWSEM_WRITER_OWNED_NOSPIN flag Waiman Long
@ 2018-05-15  6:59   ` Amir Goldstein
  2018-05-15  8:25   ` Peter Zijlstra
  1 sibling, 0 replies; 18+ messages in thread
From: Amir Goldstein @ 2018-05-15  6:59 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel,
	linux-fsdevel, Davidlohr Bueso, Theodore Y. Ts'o,
	Oleg Nesterov, Jan Kara

On Mon, May 14, 2018 at 10:31 PM, Waiman Long <longman@redhat.com> wrote:
> There are use cases where a rwsem can be acquired by one task, but
> released by another task. In thess cases, it may not be appropriate
> for the lock waiters to spin on the task that acquires the lock.
> One example will be the filesystem freeze/thaw code.
>
> To handle such use cases, a new RWSEM_WRITER_OWNED_NOSPIN
> flag can now be set in the owner field of the rwsem by the new
> rwsem_set_writer_owned_nospin() function to indicate that the rwsem is
> writer owned, but optimistic spinning on the rwsem should be disabled.
>
> Later on, the new rwsem_set_writer_owned() function can be called to
> set the new owner, if it is known. This function should not be called
> without a prior rwsem_set_writer_owned_nospin() call.
>
> Signed-off-by: Waiman Long <longman@redhat.com>

Makes sense to me. one nit.

>
> +static inline void __rwsem_set_writer_owned(struct rw_semaphore *sem,
> +                                           struct task_struct *task)

rwsem_set_owner() doesn't pass in task argument and IMO
__rwsem_set_writer_owned() shouldn't either.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release()
  2018-05-15  5:42   ` Amir Goldstein
@ 2018-05-15  7:04     ` Amir Goldstein
  2018-05-15 13:45     ` Waiman Long
  1 sibling, 0 replies; 18+ messages in thread
From: Amir Goldstein @ 2018-05-15  7:04 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel,
	linux-fsdevel, Davidlohr Bueso, Theodore Y. Ts'o,
	Oleg Nesterov, Jan Kara

On Tue, May 15, 2018 at 8:42 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, May 14, 2018 at 10:31 PM, Waiman Long <longman@redhat.com> wrote:
>> The percpu_rwsem_release() is called when the ownership of the embedded
>> rwsem is to be transferred to another task. The new owner, however, may
>> take a while to get the ownership of the lock via percpu_rwsem_acquire().
>> During that period, the rwsem is now marked as writer-owned with no
>> optimistic spinning.
>>
>
> Waiman,
>
> Thanks for the fix. I will test it soon.
>
> For this commit message I suggest that you add parts of the reproducer
> found here:
> https://marc.info/?l=linux-fsdevel&m=152622016219975&w=2
>

fsfreeze is happy with these changes.

You may add:
Tested-by: Amir Goldstein <amir73il@gmail.com>

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 1/2] locking/rwsem: Add a new RWSEM_WRITER_OWNED_NOSPIN flag
  2018-05-14 19:31 ` [RFC PATCH v2 1/2] locking/rwsem: Add a new RWSEM_WRITER_OWNED_NOSPIN flag Waiman Long
  2018-05-15  6:59   ` Amir Goldstein
@ 2018-05-15  8:25   ` Peter Zijlstra
  1 sibling, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2018-05-15  8:25 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Thomas Gleixner, linux-kernel, linux-fsdevel,
	Davidlohr Bueso, Theodore Y. Ts'o, Oleg Nesterov,
	Amir Goldstein, Jan Kara

On Mon, May 14, 2018 at 03:31:06PM -0400, Waiman Long wrote:
> There are use cases where a rwsem can be acquired by one task, but
> released by another task. In thess cases, it may not be appropriate
> for the lock waiters to spin on the task that acquires the lock.
> One example will be the filesystem freeze/thaw code.
> 
> To handle such use cases, a new RWSEM_WRITER_OWNED_NOSPIN
> flag can now be set in the owner field of the rwsem by the new
> rwsem_set_writer_owned_nospin() function to indicate that the rwsem is
> writer owned, but optimistic spinning on the rwsem should be disabled.
> 
> Later on, the new rwsem_set_writer_owned() function can be called to
> set the new owner, if it is known. This function should not be called
> without a prior rwsem_set_writer_owned_nospin() call.

Urgh.. no please don't do this. Aside from the horrible naming, do not
expose 'set-owner' semantics. Can't we just stick to the existing
_non_owner() interface without further polluting the API?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release()
  2018-05-14 19:31 ` [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release() Waiman Long
  2018-05-15  5:42   ` Amir Goldstein
@ 2018-05-15  8:35   ` Peter Zijlstra
  2018-05-15  9:00     ` Jan Kara
  2018-05-15  8:51   ` Peter Zijlstra
  2 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2018-05-15  8:35 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Thomas Gleixner, linux-kernel, linux-fsdevel,
	Davidlohr Bueso, Theodore Y. Ts'o, Oleg Nesterov,
	Amir Goldstein, Jan Kara

On Mon, May 14, 2018 at 03:31:07PM -0400, Waiman Long wrote:
> The percpu_rwsem_release() is called when the ownership of the embedded
> rwsem is to be transferred to another task. The new owner, however, may
> take a while to get the ownership of the lock via percpu_rwsem_acquire().
> During that period, the rwsem is now marked as writer-owned with no
> optimistic spinning.

This does not explain the problem sufficiently to even begin considering
if the proposed solution is sensible.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release()
  2018-05-14 19:31 ` [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release() Waiman Long
  2018-05-15  5:42   ` Amir Goldstein
  2018-05-15  8:35   ` Peter Zijlstra
@ 2018-05-15  8:51   ` Peter Zijlstra
  2018-05-15 11:06     ` Oleg Nesterov
  2018-05-15 13:57     ` Waiman Long
  2 siblings, 2 replies; 18+ messages in thread
From: Peter Zijlstra @ 2018-05-15  8:51 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Thomas Gleixner, linux-kernel, linux-fsdevel,
	Davidlohr Bueso, Theodore Y. Ts'o, Oleg Nesterov,
	Amir Goldstein, Jan Kara

On Mon, May 14, 2018 at 03:31:07PM -0400, Waiman Long wrote:
> The percpu_rwsem_release() is called when the ownership of the embedded
> rwsem is to be transferred to another task. The new owner, however, may
> take a while to get the ownership of the lock via percpu_rwsem_acquire().
> During that period, the rwsem is now marked as writer-owned with no
> optimistic spinning.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  include/linux/percpu-rwsem.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
> index b1f37a8..dd37102 100644
> --- a/include/linux/percpu-rwsem.h
> +++ b/include/linux/percpu-rwsem.h
> @@ -131,16 +131,16 @@ static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem,
>  					bool read, unsigned long ip)
>  {
>  	lock_release(&sem->rw_sem.dep_map, 1, ip);
> -#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
>  	if (!read)
> -		sem->rw_sem.owner = NULL;
> -#endif
> +		rwsem_set_writer_owned_nospin(&sem->rw_sem);
>  }
>  
>  static inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem,
>  					bool read, unsigned long ip)
>  {
>  	lock_acquire(&sem->rw_sem.dep_map, 0, 1, read, 1, NULL, ip);
> +	if (!read)
> +		rwsem_set_writer_owned(&sem->rw_sem, current);
>  }

So what's wrong with adding:

	if (!read)
		sem->rw_sem.owner = current;

?

Afaict the whole .owner=NULL thing in release already stops the spinners
dead, and the above 'fixes' the debug splat. And this avoids exposing
that horrible interface and keeps the mucking private to
rwsem/percpu_rwsem.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release()
  2018-05-15  8:35   ` Peter Zijlstra
@ 2018-05-15  9:00     ` Jan Kara
  2018-05-15 11:33       ` Oleg Nesterov
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kara @ 2018-05-15  9:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, Thomas Gleixner, linux-kernel,
	linux-fsdevel, Davidlohr Bueso, Theodore Y. Ts'o,
	Oleg Nesterov, Amir Goldstein, Jan Kara

On Tue 15-05-18 10:35:25, Peter Zijlstra wrote:
> On Mon, May 14, 2018 at 03:31:07PM -0400, Waiman Long wrote:
> > The percpu_rwsem_release() is called when the ownership of the embedded
> > rwsem is to be transferred to another task. The new owner, however, may
> > take a while to get the ownership of the lock via percpu_rwsem_acquire().
> > During that period, the rwsem is now marked as writer-owned with no
> > optimistic spinning.
> 
> This does not explain the problem sufficiently to even begin considering
> if the proposed solution is sensible.

So the original problem is following: There is percpu_rw_semaphore in
super_block which is used to implement filesystem freezing (actually three
of them but that's not really substantial here). This semaphore is acquired
for writing when a fs is frozen (i.e., in response to a syscall) and we
return to userspace with this semaphore held. Later someone else calls
another syscall to unfreeze the filesystem which drops the semaphore.

Now this behavior upsets lockdep and that's why we fool it by telling the
semaphore got released before returning to userspace (through
percpu_rwsem_release() helper) and similarly we tell lockdep we've got the
semaphore when an unfreeze syscall is called by percpu_rwsem_acquire(). Now
Amir has discovered that also rwsem debugging code gets confused by this
behavior and previously also someone noticed that rwsem spinning does not
make sense and can be broken by this behavior. So these patches from Waiman
try to fix up all these problems...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release()
  2018-05-15  8:51   ` Peter Zijlstra
@ 2018-05-15 11:06     ` Oleg Nesterov
  2018-05-15 11:51       ` Peter Zijlstra
  2018-05-15 13:57     ` Waiman Long
  1 sibling, 1 reply; 18+ messages in thread
From: Oleg Nesterov @ 2018-05-15 11:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, Thomas Gleixner, linux-kernel,
	linux-fsdevel, Davidlohr Bueso, Theodore Y. Ts'o,
	Amir Goldstein, Jan Kara

On 05/15, Peter Zijlstra wrote:
>
> So what's wrong with adding:
>
> 	if (!read)
> 		sem->rw_sem.owner = current;

Agreed, I have already suggested this change twice. Except we obviously
need to check CONFIG_RWSEM_SPIN_ON_OWNER (->owner doesn't exists otherwise)
or even CONFIG_DEBUG_RWSEMS to make the purpose more clear.

> Afaict the whole .owner=NULL thing in release already stops the spinners

Not really, the new writer will spin in this case, afaics.

But this is another problem and probably we do not care. The new writer is
almost impossible in this particular case, another freeze_super() should
notice frozen != SB_UNFROZEN and return EBUSY.

> and the above 'fixes' the debug splat.

Yes.

Waiman, can't we trivially fix the problem first? Then we can add the helpers
and think about other improvements.

Oleg.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release()
  2018-05-15  9:00     ` Jan Kara
@ 2018-05-15 11:33       ` Oleg Nesterov
  0 siblings, 0 replies; 18+ messages in thread
From: Oleg Nesterov @ 2018-05-15 11:33 UTC (permalink / raw)
  To: Jan Kara
  Cc: Peter Zijlstra, Waiman Long, Ingo Molnar, Thomas Gleixner,
	linux-kernel, linux-fsdevel, Davidlohr Bueso,
	Theodore Y. Ts'o, Amir Goldstein

On 05/15, Jan Kara wrote:
>
> Now this behavior upsets lockdep and that's why we fool it by telling the
> semaphore got released before returning to userspace (through
> percpu_rwsem_release() helper) and similarly we tell lockdep we've got the
> semaphore when an unfreeze syscall is called by percpu_rwsem_acquire(). Now
> Amir has discovered that also rwsem debugging code gets confused by this
> behavior

Yes, plus someone else has already reported the problem a month ago,

> and previously also someone noticed that rwsem spinning does not
> make sense and can be broken by this behavior.

Well, this doesn't really matter but again, freeze_super() checks
frozen == SB_UNFROZEN under sb->s_umount and only then does sb_wait_write(),
when the previous writer has already realeased this lock. So the new writer
will never spin after lockdep_sb_freeze_release() clears ->owner.

Oleg.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release()
  2018-05-15 11:06     ` Oleg Nesterov
@ 2018-05-15 11:51       ` Peter Zijlstra
  2018-05-15 12:45         ` Oleg Nesterov
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2018-05-15 11:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Waiman Long, Ingo Molnar, Thomas Gleixner, linux-kernel,
	linux-fsdevel, Davidlohr Bueso, Theodore Y. Ts'o,
	Amir Goldstein, Jan Kara

On Tue, May 15, 2018 at 01:06:33PM +0200, Oleg Nesterov wrote:
> On 05/15, Peter Zijlstra wrote:
> >
> > So what's wrong with adding:
> >
> > 	if (!read)
> > 		sem->rw_sem.owner = current;
> 
> Agreed, I have already suggested this change twice. Except we obviously
> need to check CONFIG_RWSEM_SPIN_ON_OWNER (->owner doesn't exists otherwise)
> or even CONFIG_DEBUG_RWSEMS to make the purpose more clear.

Right, details ;-)

> > Afaict the whole .owner=NULL thing in release already stops the spinners
> 
> Not really, the new writer will spin in this case, afaics.
> 
> But this is another problem and probably we do not care. The new writer is
> almost impossible in this particular case, another freeze_super() should
> notice frozen != SB_UNFROZEN and return EBUSY.

rwsem_spin_on_owner() checks rwsem_owner_is_writer(), which does owner
&& owner != RWSEM_READER_OWNED, which will fail for !owner.

Or am I completely confused again?

> > and the above 'fixes' the debug splat.
> 
> Yes.
> 
> Waiman, can't we trivially fix the problem first? Then we can add the helpers
> and think about other improvements.

It is really simple; we're not going to add public (and EXPORT'ed to
boot) interfaces to rwsem for this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release()
  2018-05-15 11:51       ` Peter Zijlstra
@ 2018-05-15 12:45         ` Oleg Nesterov
  2018-05-15 12:58           ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Oleg Nesterov @ 2018-05-15 12:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, Thomas Gleixner, linux-kernel,
	linux-fsdevel, Davidlohr Bueso, Theodore Y. Ts'o,
	Amir Goldstein, Jan Kara

On 05/15, Peter Zijlstra wrote:
>
> > > Afaict the whole .owner=NULL thing in release already stops the spinners
> >
> > Not really, the new writer will spin in this case, afaics.
> >
> > But this is another problem and probably we do not care. The new writer is
> > almost impossible in this particular case, another freeze_super() should
> > notice frozen != SB_UNFROZEN and return EBUSY.
>
> rwsem_spin_on_owner() checks rwsem_owner_is_writer(), which does owner
> && owner != RWSEM_READER_OWNED, which will fail for !owner.

Yep. So rwsem_spin_on_owner() goes to "out:" and returns
!rwsem_owner_is_reader() == T.

IOW, afaics owner == NULL means "spin unconditionally", I guess this is for
the case when the new writer is going to do rwsem_set_owner() or up_write()
has already called rwsem_clear_owner() but didn't do up_write() yet.

Probably makes sense, but the code is not very clean,

> Or am I completely confused again?

Or me, I am not sure.

Oleg.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release()
  2018-05-15 12:45         ` Oleg Nesterov
@ 2018-05-15 12:58           ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2018-05-15 12:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Waiman Long, Ingo Molnar, Thomas Gleixner, linux-kernel,
	linux-fsdevel, Davidlohr Bueso, Theodore Y. Ts'o,
	Amir Goldstein, Jan Kara

On Tue, May 15, 2018 at 02:45:32PM +0200, Oleg Nesterov wrote:
> On 05/15, Peter Zijlstra wrote:
> >
> > > > Afaict the whole .owner=NULL thing in release already stops the spinners
> > >
> > > Not really, the new writer will spin in this case, afaics.
> > >
> > > But this is another problem and probably we do not care. The new writer is
> > > almost impossible in this particular case, another freeze_super() should
> > > notice frozen != SB_UNFROZEN and return EBUSY.
> >
> > rwsem_spin_on_owner() checks rwsem_owner_is_writer(), which does owner
> > && owner != RWSEM_READER_OWNED, which will fail for !owner.
> 
> Yep. So rwsem_spin_on_owner() goes to "out:" and returns
> !rwsem_owner_is_reader() == T.
> 
> IOW, afaics owner == NULL means "spin unconditionally", I guess this is for
> the case when the new writer is going to do rwsem_set_owner() or up_write()
> has already called rwsem_clear_owner() but didn't do up_write() yet.
> 
> Probably makes sense, but the code is not very clean,

Arrgh, you're right... I hate this rwsem code.

Some day I'll finish the atomic_long_t version, which similar to mutex,
merges the owner and 'count' fields.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release()
  2018-05-15  5:42   ` Amir Goldstein
  2018-05-15  7:04     ` Amir Goldstein
@ 2018-05-15 13:45     ` Waiman Long
  1 sibling, 0 replies; 18+ messages in thread
From: Waiman Long @ 2018-05-15 13:45 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel,
	linux-fsdevel, Davidlohr Bueso, Theodore Y. Ts'o,
	Oleg Nesterov, Jan Kara

On 05/15/2018 01:42 AM, Amir Goldstein wrote:
> On Mon, May 14, 2018 at 10:31 PM, Waiman Long <longman@redhat.com> wrote:
>> The percpu_rwsem_release() is called when the ownership of the embedded
>> rwsem is to be transferred to another task. The new owner, however, may
>> take a while to get the ownership of the lock via percpu_rwsem_acquire().
>> During that period, the rwsem is now marked as writer-owned with no
>> optimistic spinning.
>>
> Waiman,
>
> Thanks for the fix. I will test it soon.
>
> For this commit message I suggest that you add parts of the reproducer
> found here:
> https://marc.info/?l=linux-fsdevel&m=152622016219975&w=2
>
> Thanks,
> Amir.
Sure. I will add that to the commit log.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release()
  2018-05-15  8:51   ` Peter Zijlstra
  2018-05-15 11:06     ` Oleg Nesterov
@ 2018-05-15 13:57     ` Waiman Long
  2018-05-15 14:00       ` Matthew Wilcox
  1 sibling, 1 reply; 18+ messages in thread
From: Waiman Long @ 2018-05-15 13:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, linux-kernel, linux-fsdevel,
	Davidlohr Bueso, Theodore Y. Ts'o, Oleg Nesterov,
	Amir Goldstein, Jan Kara

On 05/15/2018 04:51 AM, Peter Zijlstra wrote:
> On Mon, May 14, 2018 at 03:31:07PM -0400, Waiman Long wrote:
>> The percpu_rwsem_release() is called when the ownership of the embedded
>> rwsem is to be transferred to another task. The new owner, however, may
>> take a while to get the ownership of the lock via percpu_rwsem_acquire().
>> During that period, the rwsem is now marked as writer-owned with no
>> optimistic spinning.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>  include/linux/percpu-rwsem.h | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
>> index b1f37a8..dd37102 100644
>> --- a/include/linux/percpu-rwsem.h
>> +++ b/include/linux/percpu-rwsem.h
>> @@ -131,16 +131,16 @@ static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem,
>>  					bool read, unsigned long ip)
>>  {
>>  	lock_release(&sem->rw_sem.dep_map, 1, ip);
>> -#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
>>  	if (!read)
>> -		sem->rw_sem.owner = NULL;
>> -#endif
>> +		rwsem_set_writer_owned_nospin(&sem->rw_sem);
>>  }
>>  
>>  static inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem,
>>  					bool read, unsigned long ip)
>>  {
>>  	lock_acquire(&sem->rw_sem.dep_map, 0, 1, read, 1, NULL, ip);
>> +	if (!read)
>> +		rwsem_set_writer_owned(&sem->rw_sem, current);
>>  }
> So what's wrong with adding:
>
> 	if (!read)
> 		sem->rw_sem.owner = current;
>
> ?

Yes, we can certainly do that within a "#ifdef" block.

>
> Afaict the whole .owner=NULL thing in release already stops the spinners
> dead, and the above 'fixes' the debug splat. And this avoids exposing
> that horrible interface and keeps the mucking private to
> rwsem/percpu_rwsem.

Actually setting owner to NULL does not stop spinning. The code just
assume that the lock is going to be freed and spin in the outer loop. We
need some special value to indicate that spinning should be stopped. How
about just exposing a special value for that in linux/rwsem.h? Any
suggestion for a good name?

Cheers,
Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release()
  2018-05-15 13:57     ` Waiman Long
@ 2018-05-15 14:00       ` Matthew Wilcox
  0 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox @ 2018-05-15 14:00 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, linux-kernel,
	linux-fsdevel, Davidlohr Bueso, Theodore Y. Ts'o,
	Oleg Nesterov, Amir Goldstein, Jan Kara

On Tue, May 15, 2018 at 09:57:44AM -0400, Waiman Long wrote:
> > Afaict the whole .owner=NULL thing in release already stops the spinners
> > dead, and the above 'fixes' the debug splat. And this avoids exposing
> > that horrible interface and keeps the mucking private to
> > rwsem/percpu_rwsem.
> 
> Actually setting owner to NULL does not stop spinning. The code just
> assume that the lock is going to be freed and spin in the outer loop. We
> need some special value to indicate that spinning should be stopped. How
> about just exposing a special value for that in linux/rwsem.h? Any
> suggestion for a good name?

RWSEM_NO_OWNER

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-05-15 14:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-14 19:31 [RFC PATCH v2 0/2] locking/rwsem: Fix DEBUG_RWSEM warning from thaw_sup Waiman Long
2018-05-14 19:31 ` [RFC PATCH v2 1/2] locking/rwsem: Add a new RWSEM_WRITER_OWNED_NOSPIN flag Waiman Long
2018-05-15  6:59   ` Amir Goldstein
2018-05-15  8:25   ` Peter Zijlstra
2018-05-14 19:31 ` [RFC PATCH v2 2/2] locking/percpu-rwsem: Mark rwsem as non-spinnable in percpu_rwsem_release() Waiman Long
2018-05-15  5:42   ` Amir Goldstein
2018-05-15  7:04     ` Amir Goldstein
2018-05-15 13:45     ` Waiman Long
2018-05-15  8:35   ` Peter Zijlstra
2018-05-15  9:00     ` Jan Kara
2018-05-15 11:33       ` Oleg Nesterov
2018-05-15  8:51   ` Peter Zijlstra
2018-05-15 11:06     ` Oleg Nesterov
2018-05-15 11:51       ` Peter Zijlstra
2018-05-15 12:45         ` Oleg Nesterov
2018-05-15 12:58           ` Peter Zijlstra
2018-05-15 13:57     ` Waiman Long
2018-05-15 14:00       ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).