[PATCH v3 0/5] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/5] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff
@ 2022-10-17 21:13 Waiman Long
  2022-10-17 21:13 ` [PATCH v3 1/5] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Waiman Long @ 2022-10-17 21:13 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷,
	Waiman Long

v3:
 - Make a minor cleanup to patch 1.
 - Add 3 more patches to implement true lock handoff.

v2:
 - Add an additional patch to limit the # of first waiter optimistic
   spinning in the writer slowpath.

It turns out the current waiter optimistic spinning code does not work
that well if we have RT tasks in the mix. This patch series include two
different fixes to resolve those issues. The last 3 patches modify the
handoff code to implement true lock handoff similar to that of mutex.

Waiman Long (5):
  locking/rwsem: Prevent non-first waiter from spinning in down_write()
    slowpath
  locking/rwsem: Limit # of null owner retries for handoff writer
  locking/rwsem: Change waiter->hanodff_set to a handoff_state enum
  locking/rwsem: Enable direct rwsem lock handoff
  locking/rwsem: Update handoff lock events tracking

 kernel/locking/lock_events_list.h |   6 +-
 kernel/locking/rwsem.c            | 172 +++++++++++++++++++++++-------
 2 files changed, 138 insertions(+), 40 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/5] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-10-17 21:13 [PATCH v3 0/5] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Waiman Long
@ 2022-10-17 21:13 ` Waiman Long
  2022-10-24 13:17   ` Peter Zijlstra
  2022-10-17 21:13 ` [PATCH v3 2/5] locking/rwsem: Limit # of null owner retries for handoff writer Waiman Long
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2022-10-17 21:13 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷,
	Waiman Long

A non-first waiter can potentially spin in the for loop of
rwsem_down_write_slowpath() without sleeping but fail to acquire the
lock even if the rwsem is free if the following sequence happens:

  Non-first waiter       First waiter      Lock holder
  ----------------       ------------      -----------
  Acquire wait_lock
  rwsem_try_write_lock():
    Set handoff bit if RT or
      wait too long
    Set waiter->handoff_set
  Release wait_lock
                         Acquire wait_lock
                         Inherit waiter->handoff_set
                         Release wait_lock
					   Clear owner
                                           Release lock
  if (waiter.handoff_set) {
    rwsem_spin_on_owner(();
    if (OWNER_NULL)
      goto trylock_again;
  }
  trylock_again:
  Acquire wait_lock
  rwsem_try_write_lock():
     if (first->handoff_set && (waiter != first))
     	return false;
  Release wait_lock

It is especially problematic if the non-first waiter is an RT task and
it is running on the same CPU as the first waiter as this can lead to
live lock.

Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
Reviewed-and-tested-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 44873594de03..be2df9ea7c30 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -624,18 +624,16 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
 			 */
 			if (first->handoff_set && (waiter != first))
 				return false;
-
-			/*
-			 * First waiter can inherit a previously set handoff
-			 * bit and spin on rwsem if lock acquisition fails.
-			 */
-			if (waiter == first)
-				waiter->handoff_set = true;
 		}
 
 		new = count;
 
 		if (count & RWSEM_LOCK_MASK) {
+			/*
+			 * A waiter (first or not) can set the handoff bit
+			 * if it is an RT task or wait in the wait queue
+			 * for too long.
+			 */
 			if (has_handoff || (!rt_task(waiter->task) &&
 					    !time_after(jiffies, waiter->timeout)))
 				return false;
@@ -651,11 +649,12 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
 	} while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new));
 
 	/*
-	 * We have either acquired the lock with handoff bit cleared or
-	 * set the handoff bit.
+	 * We have either acquired the lock with handoff bit cleared or set
+	 * the handoff bit. Only the first waiter can have its handoff_set
+	 * set here to enable optimistic spinning in slowpath loop.
 	 */
 	if (new & RWSEM_FLAG_HANDOFF) {
-		waiter->handoff_set = true;
+		first->handoff_set = true;
 		lockevent_inc(rwsem_wlock_handoff);
 		return false;
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/5] locking/rwsem: Limit # of null owner retries for handoff writer
  2022-10-17 21:13 [PATCH v3 0/5] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Waiman Long
  2022-10-17 21:13 ` [PATCH v3 1/5] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
@ 2022-10-17 21:13 ` Waiman Long
  2022-10-24 13:31   ` Peter Zijlstra
  2022-10-17 21:13 ` [PATCH v3 3/5] locking/rwsem: Change waiter->hanodff_set to a handoff_state enum Waiman Long
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2022-10-17 21:13 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷,
	Waiman Long

Commit 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically
spin on owner") assumes that when the owner field is changed to NULL,
the lock will become free soon.  That assumption may not be correct
especially if the handoff writer doing the spinning is a RT task which
may preempt another task from completing its action of either freeing
the rwsem or properly setting up owner.

To prevent this live lock scenario, we have to limit the number of
trylock attempts without sleeping. The current limit is now set to 8
to allow enough time for the other task to hopefully complete its action.

By adding new lock events to track the number of NULL owner retries with
handoff flag set before a successful trylock when running a 96 threads
locking microbenchmark with equal number of readers and writers running
on a 2-core 96-thread system for 15 seconds, the following stats are
obtained. Note that none of locking threads are RT tasks.

  Retries of successful trylock    Count
  -----------------------------    -----
             1                     1738
             2                       19
             3                       11
             4                        2
             5                        1
             6                        1
             7                        1
             8                        0
             X                        1

The last row is the one failed attempt that needs more than 8 retries.
So a retry count maximum of 8 should capture most of them if no RT task
is in the mix.

Fixes: 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner")
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-and-tested-by: Mukesh Ojha <quic_mojha@quicinc.com>
---
 kernel/locking/rwsem.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index be2df9ea7c30..c68d76fc8c68 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1115,6 +1115,7 @@ static struct rw_semaphore __sched *
 rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 {
 	struct rwsem_waiter waiter;
+	int null_owner_retries;
 	DEFINE_WAKE_Q(wake_q);
 
 	/* do optimistic spinning and steal lock if possible */
@@ -1156,7 +1157,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 	set_current_state(state);
 	trace_contention_begin(sem, LCB_F_WRITE);
 
-	for (;;) {
+	for (null_owner_retries = 0;;) {
 		if (rwsem_try_write_lock(sem, &waiter)) {
 			/* rwsem_try_write_lock() implies ACQUIRE on success */
 			break;
@@ -1182,8 +1183,21 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 			owner_state = rwsem_spin_on_owner(sem);
 			preempt_enable();
 
-			if (owner_state == OWNER_NULL)
+			/*
+			 * owner is NULL doesn't guarantee the lock is free.
+			 * An incoming reader will temporarily increment the
+			 * reader count without changing owner and the
+			 * rwsem_try_write_lock() will fails if the reader
+			 * is not able to decrement it in time. Allow 8
+			 * trylock attempts when hitting a NULL owner before
+			 * going to sleep.
+			 */
+			if ((owner_state == OWNER_NULL) &&
+			    (null_owner_retries < 8)) {
+				null_owner_retries++;
 				goto trylock_again;
+			}
+			null_owner_retries = 0;
 		}
 
 		schedule();
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/5] locking/rwsem: Change waiter->hanodff_set to a handoff_state enum
  2022-10-17 21:13 [PATCH v3 0/5] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Waiman Long
  2022-10-17 21:13 ` [PATCH v3 1/5] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
  2022-10-17 21:13 ` [PATCH v3 2/5] locking/rwsem: Limit # of null owner retries for handoff writer Waiman Long
@ 2022-10-17 21:13 ` Waiman Long
  2022-10-17 21:13 ` [PATCH v3 4/5] locking/rwsem: Enable direct rwsem lock handoff Waiman Long
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2022-10-17 21:13 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷,
	Waiman Long

Change the boolean waiter->handoff_set to an enum type so that we can
have more states in some later patches. Also use READ_ONCE() outside
wait_lock critical sections for read and WRITE_ONCE() inside wait_lock
critical sections for write for proper synchronization. There is no
functional change.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index c68d76fc8c68..a8bfc905637a 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -338,12 +338,17 @@ enum rwsem_waiter_type {
 	RWSEM_WAITING_FOR_READ
 };
 
+enum rwsem_handoff_state {
+	HANDOFF_NONE = 0,
+	HANDOFF_REQUESTED,
+};
+
 struct rwsem_waiter {
 	struct list_head list;
 	struct task_struct *task;
 	enum rwsem_waiter_type type;
+	enum rwsem_handoff_state handoff_state;
 	unsigned long timeout;
-	bool handoff_set;
 };
 #define rwsem_first_waiter(sem) \
 	list_first_entry(&sem->wait_list, struct rwsem_waiter, list)
@@ -470,7 +475,7 @@ static void rwsem_mark_wake(struct rw_semaphore *sem,
 					adjustment -= RWSEM_FLAG_HANDOFF;
 					lockevent_inc(rwsem_rlock_handoff);
 				}
-				waiter->handoff_set = true;
+				WRITE_ONCE(waiter->handoff_state, HANDOFF_REQUESTED);
 			}
 
 			atomic_long_add(-adjustment, &sem->count);
@@ -622,7 +627,7 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
 			 * waiter is the one that set it. Otherwisee, we
 			 * still try to acquire the rwsem.
 			 */
-			if (first->handoff_set && (waiter != first))
+			if (first->handoff_state && (waiter != first))
 				return false;
 		}
 
@@ -650,11 +655,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
 
 	/*
 	 * We have either acquired the lock with handoff bit cleared or set
-	 * the handoff bit. Only the first waiter can have its handoff_set
+	 * the handoff bit. Only the first waiter can have its handoff_state
 	 * set here to enable optimistic spinning in slowpath loop.
 	 */
 	if (new & RWSEM_FLAG_HANDOFF) {
-		first->handoff_set = true;
+		WRITE_ONCE(first->handoff_state, HANDOFF_REQUESTED);
 		lockevent_inc(rwsem_wlock_handoff);
 		return false;
 	}
@@ -1043,7 +1048,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_READ;
 	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
-	waiter.handoff_set = false;
+	waiter.handoff_state = HANDOFF_NONE;
 
 	raw_spin_lock_irq(&sem->wait_lock);
 	if (list_empty(&sem->wait_list)) {
@@ -1131,7 +1136,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_WRITE;
 	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
-	waiter.handoff_set = false;
+	waiter.handoff_state = HANDOFF_NONE;
 
 	raw_spin_lock_irq(&sem->wait_lock);
 	rwsem_add_waiter(sem, &waiter);
@@ -1176,7 +1181,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 		 * In this case, we attempt to acquire the lock again
 		 * without sleeping.
 		 */
-		if (waiter.handoff_set) {
+		if (READ_ONCE(waiter.handoff_state)) {
 			enum owner_state owner_state;
 
 			preempt_disable();
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 4/5] locking/rwsem: Enable direct rwsem lock handoff
  2022-10-17 21:13 [PATCH v3 0/5] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Waiman Long
                   ` (2 preceding siblings ...)
  2022-10-17 21:13 ` [PATCH v3 3/5] locking/rwsem: Change waiter->hanodff_set to a handoff_state enum Waiman Long
@ 2022-10-17 21:13 ` Waiman Long
  2022-10-17 21:13 ` [PATCH v3 5/5] locking/rwsem: Update handoff lock events tracking Waiman Long
       [not found] ` <20221018111424.1007-1-hdanton@sina.com>
  5 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2022-10-17 21:13 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷,
	Waiman Long

The lock handoff provided in rwsem isn't a true handoff like that in
the mutex. Instead, it is more like a quiescent state where optimistic
spinning and lock stealing are disabled to make it easier for the first
waiter to acquire the lock.

Reworking the code to enable a true lock handoff is more complex due to
the following facts:
 1) The RWSEM_FLAG_HANDOFF bit is protected by the wait_lock and it
    is too expensive to always take the wait_lock in the unlock path
    to prevent racing.
 2) The reader lock fast path may add a RWSEM_READER_BIAS at the wrong
    time to prevent a proper lock handoff from a reader owned rwsem.

A lock handoff can only be initiated when the following conditions are
true:
 1) The RWSEM_FLAG_HANDOFF bit is set.
 2) The task to do the handoff don't see any other active lock
    excluding the lock that it might have held.

The new handoff mechanism performs handoff in rwsem_wakeup() to minimize
overhead. The rwsem count will be known at that point to determine if
handoff should be done. However, there is a small time gap between the
rwsem becomes free and the wait_lock is taken where a reader can come
in and add a RWSEM_READER_BIAS to the count or the current first waiter
can take the rwsem and clear RWSEM_FLAG_HANDOFF in the interim. That
will fail the handoff operation. To handle the former case, a secondary
handoff will also be done in the rwsem_down_read_slowpath() to catch it.

With true lock handoff, there is no need to do a NULL owner spinning
anymore as wakeup will be performed if handoff is possible. So it
is likely that the first waiter won't actually go to sleep even when
schedule() is called in this case.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem.c | 141 +++++++++++++++++++++++++++++++----------
 1 file changed, 109 insertions(+), 32 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index a8bfc905637a..287606aee0e6 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -341,6 +341,7 @@ enum rwsem_waiter_type {
 enum rwsem_handoff_state {
 	HANDOFF_NONE = 0,
 	HANDOFF_REQUESTED,
+	HANDOFF_GRANTED,
 };
 
 struct rwsem_waiter {
@@ -489,6 +490,12 @@ static void rwsem_mark_wake(struct rw_semaphore *sem,
 		 */
 		owner = waiter->task;
 		__rwsem_set_reader_owned(sem, owner);
+	} else if (waiter->handoff_state == HANDOFF_GRANTED) {
+		/*
+		 * rwsem_handoff() has added to count RWSEM_READER_BIAS of
+		 * the first waiter.
+		 */
+		adjustment = RWSEM_READER_BIAS;
 	}
 
 	/*
@@ -586,7 +593,7 @@ rwsem_del_wake_waiter(struct rw_semaphore *sem, struct rwsem_waiter *waiter,
 		      struct wake_q_head *wake_q)
 		      __releases(&sem->wait_lock)
 {
-	bool first = rwsem_first_waiter(sem) == waiter;
+	struct rwsem_waiter *first = rwsem_first_waiter(sem);
 
 	wake_q_init(wake_q);
 
@@ -595,8 +602,21 @@ rwsem_del_wake_waiter(struct rw_semaphore *sem, struct rwsem_waiter *waiter,
 	 * the first waiter, we wake up the remaining waiters as they may
 	 * be eligible to acquire or spin on the lock.
 	 */
-	if (rwsem_del_waiter(sem, waiter) && first)
+	if (rwsem_del_waiter(sem, waiter) && (waiter == first)) {
+		switch (waiter->handoff_state) {
+		case HANDOFF_GRANTED:
+			raw_spin_unlock_irq(&sem->wait_lock);
+			return;
+		case HANDOFF_REQUESTED:
+			/* Pass handoff state to the new first waiter */
+			first = rwsem_first_waiter(sem);
+			WRITE_ONCE(first->handoff_state, HANDOFF_REQUESTED);
+			fallthrough;
+		default:
+			break;
+		}
 		rwsem_mark_wake(sem, RWSEM_WAKE_ANY, wake_q);
+	}
 	raw_spin_unlock_irq(&sem->wait_lock);
 	if (!wake_q_empty(wake_q))
 		wake_up_q(wake_q);
@@ -764,6 +784,11 @@ rwsem_spin_on_owner(struct rw_semaphore *sem)
 
 	owner = rwsem_owner_flags(sem, &flags);
 	state = rwsem_owner_state(owner, flags);
+
+	/* A handoff may have been granted */
+	if (!flags && (owner == current))
+		return OWNER_NONSPINNABLE;
+
 	if (state != OWNER_WRITER)
 		return state;
 
@@ -977,6 +1002,32 @@ rwsem_spin_on_owner(struct rw_semaphore *sem)
 }
 #endif
 
+/*
+ * Hand off the lock to the first waiter
+ */
+static void rwsem_handoff(struct rw_semaphore *sem, long adj,
+			  struct wake_q_head *wake_q)
+{
+	struct rwsem_waiter *waiter;
+	enum rwsem_wake_type wake_type;
+
+	lockdep_assert_held(&sem->wait_lock);
+	adj -= RWSEM_FLAG_HANDOFF;
+	waiter = rwsem_first_waiter(sem);
+	WRITE_ONCE(waiter->handoff_state, HANDOFF_GRANTED);
+	if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
+		wake_type = RWSEM_WAKE_ANY;
+		adj += RWSEM_WRITER_LOCKED;
+		atomic_long_set(&sem->owner, (long)waiter->task);
+	} else {
+		wake_type = RWSEM_WAKE_READ_OWNED;
+		adj += RWSEM_READER_BIAS;
+		__rwsem_set_reader_owned(sem, waiter->task);
+	}
+	atomic_long_add(adj, &sem->count);
+	rwsem_mark_wake(sem, wake_type, wake_q);
+}
+
 /*
  * Prepare to wake up waiter(s) in the wait queue by putting them into the
  * given wake_q if the rwsem lock owner isn't a writer. If rwsem is likely
@@ -1051,6 +1102,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
 	waiter.handoff_state = HANDOFF_NONE;
 
 	raw_spin_lock_irq(&sem->wait_lock);
+	count = atomic_long_read(&sem->count);
 	if (list_empty(&sem->wait_list)) {
 		/*
 		 * In case the wait queue is empty and the lock isn't owned
@@ -1058,7 +1110,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
 		 * immediately as its RWSEM_READER_BIAS has already been set
 		 * in the count.
 		 */
-		if (!(atomic_long_read(&sem->count) & RWSEM_WRITER_MASK)) {
+		if (!(count & RWSEM_WRITER_MASK)) {
 			/* Provide lock ACQUIRE */
 			smp_acquire__after_ctrl_dep();
 			raw_spin_unlock_irq(&sem->wait_lock);
@@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
 			return sem;
 		}
 		adjustment += RWSEM_FLAG_WAITERS;
+	} else if ((count & RWSEM_FLAG_HANDOFF) &&
+		  ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) {
+		/*
+		 * If the waiter to be handed off is a reader, this reader
+		 * can piggyback on top of top of that.
+		 */
+		if (rwsem_first_waiter(sem)->type == RWSEM_WAITING_FOR_READ)
+			adjustment = 0;
+		rwsem_handoff(sem, adjustment, &wake_q);
+
+		if (!adjustment) {
+			raw_spin_unlock_irq(&sem->wait_lock);
+			wake_up_q(&wake_q);
+			return sem;
+		}
+		adjustment = 0;
 	}
 	rwsem_add_waiter(sem, &waiter);
 
-	/* we're now waiting on the lock, but no longer actively locking */
-	count = atomic_long_add_return(adjustment, &sem->count);
-
-	rwsem_cond_wake_waiter(sem, count, &wake_q);
+	if (adjustment) {
+		/*
+		 * We are now waiting on the lock with no handoff, but no
+		 * longer actively locking.
+		 */
+		count = atomic_long_add_return(adjustment, &sem->count);
+		rwsem_cond_wake_waiter(sem, count, &wake_q);
+	}
 	raw_spin_unlock_irq(&sem->wait_lock);
 
 	if (!wake_q_empty(&wake_q))
@@ -1120,7 +1192,6 @@ static struct rw_semaphore __sched *
 rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 {
 	struct rwsem_waiter waiter;
-	int null_owner_retries;
 	DEFINE_WAKE_Q(wake_q);
 
 	/* do optimistic spinning and steal lock if possible */
@@ -1162,7 +1233,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 	set_current_state(state);
 	trace_contention_begin(sem, LCB_F_WRITE);
 
-	for (null_owner_retries = 0;;) {
+	for (;;) {
 		if (rwsem_try_write_lock(sem, &waiter)) {
 			/* rwsem_try_write_lock() implies ACQUIRE on success */
 			break;
@@ -1182,37 +1253,28 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 		 * without sleeping.
 		 */
 		if (READ_ONCE(waiter.handoff_state)) {
-			enum owner_state owner_state;
-
-			preempt_disable();
-			owner_state = rwsem_spin_on_owner(sem);
-			preempt_enable();
-
-			/*
-			 * owner is NULL doesn't guarantee the lock is free.
-			 * An incoming reader will temporarily increment the
-			 * reader count without changing owner and the
-			 * rwsem_try_write_lock() will fails if the reader
-			 * is not able to decrement it in time. Allow 8
-			 * trylock attempts when hitting a NULL owner before
-			 * going to sleep.
-			 */
-			if ((owner_state == OWNER_NULL) &&
-			    (null_owner_retries < 8)) {
-				null_owner_retries++;
-				goto trylock_again;
+			if (READ_ONCE(waiter.handoff_state) == HANDOFF_REQUESTED) {
+				preempt_disable();
+				rwsem_spin_on_owner(sem);
+				preempt_enable();
 			}
-			null_owner_retries = 0;
+			if (READ_ONCE(waiter.handoff_state) == HANDOFF_GRANTED)
+				goto skip_sleep;
 		}
 
 		schedule();
 		lockevent_inc(rwsem_sleep_writer);
 		set_current_state(state);
-trylock_again:
+skip_sleep:
 		raw_spin_lock_irq(&sem->wait_lock);
+		if (waiter.handoff_state == HANDOFF_GRANTED) {
+			rwsem_del_waiter(sem, &waiter);
+			break;
+		}
 	}
 	__set_current_state(TASK_RUNNING);
 	raw_spin_unlock_irq(&sem->wait_lock);
+out_lock:
 	lockevent_inc(rwsem_wlock);
 	trace_contention_end(sem, 0);
 	return sem;
@@ -1221,6 +1283,9 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 	__set_current_state(TASK_RUNNING);
 	raw_spin_lock_irq(&sem->wait_lock);
 	rwsem_del_wake_waiter(sem, &waiter, &wake_q);
+	if (unlikely(READ_ONCE(waiter.handoff_state) == HANDOFF_GRANTED))
+		goto out_lock;
+
 	lockevent_inc(rwsem_wlock_fail);
 	trace_contention_end(sem, -EINTR);
 	return ERR_PTR(-EINTR);
@@ -1232,12 +1297,24 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
  */
 static struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 {
-	unsigned long flags;
 	DEFINE_WAKE_Q(wake_q);
+	unsigned long flags;
+	long count;
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
-	if (!list_empty(&sem->wait_list))
+	if (list_empty(&sem->wait_list)) {
+		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+		return sem;
+	}
+	/*
+	 * If the rwsem is free and handoff flag is set with wait_lock held,
+	 * no other CPUs can take an active lock.
+	 */
+	count = atomic_long_read(&sem->count);
+	if (!(count & RWSEM_LOCK_MASK) && (count & RWSEM_FLAG_HANDOFF))
+		rwsem_handoff(sem, 0, &wake_q);
+	else
 		rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
 
 	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 5/5] locking/rwsem: Update handoff lock events tracking
  2022-10-17 21:13 [PATCH v3 0/5] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Waiman Long
                   ` (3 preceding siblings ...)
  2022-10-17 21:13 ` [PATCH v3 4/5] locking/rwsem: Enable direct rwsem lock handoff Waiman Long
@ 2022-10-17 21:13 ` Waiman Long
       [not found] ` <20221018111424.1007-1-hdanton@sina.com>
  5 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2022-10-17 21:13 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷,
	Waiman Long

With the new direct rwsem lock handoff, the corresponding handoff lock
events are updated to also track the number of secondary lock handoffs
in rwsem_down_read_slowpath() to see how prevalent those handoff
events are. The number of primary lock handoffs in the unlock paths is
(rwsem_handoff_read + rwsem_handoff_write - rwsem_handoff_rslow).

After running a 96-thread rwsem microbenchmark with equal number
of readers and writers on a 2-socket 96-thread system for 40s, the
following handoff stats were obtained:

  rwsem_handoff_read=189
  rwsem_handoff_rslow=1
  rwsem_handoff_write=6678
  rwsem_handoff_wspin=6681

The number of primary handoffs was 6866, whereas there was only one
secondary handoff for this test run.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lock_events_list.h | 6 ++++--
 kernel/locking/rwsem.c            | 9 +++++----
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index 97fb6f3f840a..04d101767c2c 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -63,7 +63,9 @@ LOCK_EVENT(rwsem_rlock)		/* # of read locks acquired		*/
 LOCK_EVENT(rwsem_rlock_steal)	/* # of read locks by lock stealing	*/
 LOCK_EVENT(rwsem_rlock_fast)	/* # of fast read locks acquired	*/
 LOCK_EVENT(rwsem_rlock_fail)	/* # of failed read lock acquisitions	*/
-LOCK_EVENT(rwsem_rlock_handoff)	/* # of read lock handoffs		*/
 LOCK_EVENT(rwsem_wlock)		/* # of write locks acquired		*/
 LOCK_EVENT(rwsem_wlock_fail)	/* # of failed write lock acquisitions	*/
-LOCK_EVENT(rwsem_wlock_handoff)	/* # of write lock handoffs		*/
+LOCK_EVENT(rwsem_handoff_read)	/* # of read lock handoffs		*/
+LOCK_EVENT(rwsem_handoff_write)	/* # of write lock handoffs		*/
+LOCK_EVENT(rwsem_handoff_rslow)	/* # of handoffs in read slowpath	*/
+LOCK_EVENT(rwsem_handoff_wspin)	/* # of handoff spins in write slowpath	*/
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 287606aee0e6..46aea1994bf8 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -472,10 +472,8 @@ static void rwsem_mark_wake(struct rw_semaphore *sem,
 			 * force the issue.
 			 */
 			if (time_after(jiffies, waiter->timeout)) {
-				if (!(oldcount & RWSEM_FLAG_HANDOFF)) {
+				if (!(oldcount & RWSEM_FLAG_HANDOFF))
 					adjustment -= RWSEM_FLAG_HANDOFF;
-					lockevent_inc(rwsem_rlock_handoff);
-				}
 				WRITE_ONCE(waiter->handoff_state, HANDOFF_REQUESTED);
 			}
 
@@ -680,7 +678,6 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
 	 */
 	if (new & RWSEM_FLAG_HANDOFF) {
 		WRITE_ONCE(first->handoff_state, HANDOFF_REQUESTED);
-		lockevent_inc(rwsem_wlock_handoff);
 		return false;
 	}
 
@@ -1019,10 +1016,12 @@ static void rwsem_handoff(struct rw_semaphore *sem, long adj,
 		wake_type = RWSEM_WAKE_ANY;
 		adj += RWSEM_WRITER_LOCKED;
 		atomic_long_set(&sem->owner, (long)waiter->task);
+		lockevent_inc(rwsem_handoff_write);
 	} else {
 		wake_type = RWSEM_WAKE_READ_OWNED;
 		adj += RWSEM_READER_BIAS;
 		__rwsem_set_reader_owned(sem, waiter->task);
+		lockevent_inc(rwsem_handoff_read);
 	}
 	atomic_long_add(adj, &sem->count);
 	rwsem_mark_wake(sem, wake_type, wake_q);
@@ -1128,6 +1127,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
 		if (rwsem_first_waiter(sem)->type == RWSEM_WAITING_FOR_READ)
 			adjustment = 0;
 		rwsem_handoff(sem, adjustment, &wake_q);
+		lockevent_inc(rwsem_handoff_rslow);
 
 		if (!adjustment) {
 			raw_spin_unlock_irq(&sem->wait_lock);
@@ -1257,6 +1257,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 				preempt_disable();
 				rwsem_spin_on_owner(sem);
 				preempt_enable();
+				lockevent_inc(rwsem_handoff_wspin);
 			}
 			if (READ_ONCE(waiter.handoff_state) == HANDOFF_GRANTED)
 				goto skip_sleep;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/5] locking/rwsem: Enable direct rwsem lock handoff
       [not found] ` <20221018111424.1007-1-hdanton@sina.com>
@ 2022-10-18 14:13   ` Mukesh Ojha
  2022-10-18 17:37     ` Waiman Long
       [not found]     ` <20221018235138.1088-1-hdanton@sina.com>
  0 siblings, 2 replies; 22+ messages in thread
From: Mukesh Ojha @ 2022-10-18 14:13 UTC (permalink / raw)
  To: Hillf Danton, Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, john.p.donnelly, Ting11 Wang 王婷

Hi,

On 10/18/2022 4:44 PM, Hillf Danton wrote:
> On 17 Oct 2022 17:13:55 -0400 Waiman Long <longman@redhat.com>
>> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
>>   			return sem;
>>   		}
>>   		adjustment += RWSEM_FLAG_WAITERS;
>> +	} else if ((count & RWSEM_FLAG_HANDOFF) &&
>> +		  ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) {
> 
> Could a couple of CPUs go read slow path in parallel?
> 
>> +		/*
>> +		 * If the waiter to be handed off is a reader, this reader
>> +		 * can piggyback on top of top of that.
>> +		 */
>> +		if (rwsem_first_waiter(sem)->type == RWSEM_WAITING_FOR_READ)
>> +			adjustment = 0;
>> +		rwsem_handoff(sem, adjustment, &wake_q);
>> +
>> +		if (!adjustment) {
>> +			raw_spin_unlock_irq(&sem->wait_lock);
>> +			wake_up_q(&wake_q);
>> +			return sem;
>> +		}
>> +		adjustment = 0;
>>   	}
>>   	rwsem_add_waiter(sem, &waiter);
> 
> Why can this acquirer pigyback without becoming a waiter?
> 
>>   
>> -	/* we're now waiting on the lock, but no longer actively locking */
>> -	count = atomic_long_add_return(adjustment, &sem->count);
>> -
>> -	rwsem_cond_wake_waiter(sem, count, &wake_q);
>> +	if (adjustment) {
>> +		/*
>> +		 * We are now waiting on the lock with no handoff, but no
>> +		 * longer actively locking.
>> +		 */
>> +		count = atomic_long_add_return(adjustment, &sem->count);
>> +		rwsem_cond_wake_waiter(sem, count, &wake_q);
>> +	}
>>   	raw_spin_unlock_irq(&sem->wait_lock);
>>   
>>   	if (!wake_q_empty(&wake_q))
>> @@ -1120,7 +1192,6 @@ static struct rw_semaphore __sched *
>>   rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
>>   {
>>   	struct rwsem_waiter waiter;
>> -	int null_owner_retries;
> 
> This reverts 2/5 and feel free to drop it directly.

I think, he intents to tag the first two patches to go to stable branches.

-Mukesh
> 
> Hillf

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/5] locking/rwsem: Enable direct rwsem lock handoff
  2022-10-18 14:13   ` [PATCH v3 4/5] locking/rwsem: Enable direct rwsem lock handoff Mukesh Ojha
@ 2022-10-18 17:37     ` Waiman Long
       [not found]     ` <20221018235138.1088-1-hdanton@sina.com>
  1 sibling, 0 replies; 22+ messages in thread
From: Waiman Long @ 2022-10-18 17:37 UTC (permalink / raw)
  To: Mukesh Ojha, Hillf Danton
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	linux-kernel, john.p.donnelly, Ting11 Wang 王婷

On 10/18/22 10:13, Mukesh Ojha wrote:
> Hi,
>
> On 10/18/2022 4:44 PM, Hillf Danton wrote:
>> On 17 Oct 2022 17:13:55 -0400 Waiman Long <longman@redhat.com>
>>> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore 
>>> *sem, long count, unsigned int stat
>>>               return sem;
>>>           }
>>>           adjustment += RWSEM_FLAG_WAITERS;
>>> +    } else if ((count & RWSEM_FLAG_HANDOFF) &&
>>> +          ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) {
>>
>> Could a couple of CPUs go read slow path in parallel?
This is under wait_lock protection. So no parallel execution is possible.
>>
>>> +        /*
>>> +         * If the waiter to be handed off is a reader, this reader
>>> +         * can piggyback on top of top of that.
>>> +         */
>>> +        if (rwsem_first_waiter(sem)->type == RWSEM_WAITING_FOR_READ)
>>> +            adjustment = 0;
>>> +        rwsem_handoff(sem, adjustment, &wake_q);
>>> +
>>> +        if (!adjustment) {
>>> +            raw_spin_unlock_irq(&sem->wait_lock);
>>> +            wake_up_q(&wake_q);
>>> +            return sem;
>>> +        }
>>> +        adjustment = 0;
>>>       }
>>>       rwsem_add_waiter(sem, &waiter);
>>
>> Why can this acquirer pigyback without becoming a waiter?
The idea is to have as much reader parallelism as possible without 
writer starvation. In other word, a continuous stream of readers is not 
allowed to block out writer. However, there are places where allow one 
more reader to get the lock won't cause writer starvation.
>>
>>>   -    /* we're now waiting on the lock, but no longer actively 
>>> locking */
>>> -    count = atomic_long_add_return(adjustment, &sem->count);
>>> -
>>> -    rwsem_cond_wake_waiter(sem, count, &wake_q);
>>> +    if (adjustment) {
>>> +        /*
>>> +         * We are now waiting on the lock with no handoff, but no
>>> +         * longer actively locking.
>>> +         */
>>> +        count = atomic_long_add_return(adjustment, &sem->count);
>>> +        rwsem_cond_wake_waiter(sem, count, &wake_q);
>>> +    }
>>>       raw_spin_unlock_irq(&sem->wait_lock);
>>>         if (!wake_q_empty(&wake_q))
>>> @@ -1120,7 +1192,6 @@ static struct rw_semaphore __sched *
>>>   rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
>>>   {
>>>       struct rwsem_waiter waiter;
>>> -    int null_owner_retries;
>>
>> This reverts 2/5 and feel free to drop it directly.
>
> I think, he intents to tag the first two patches to go to stable 
> branches.

This patch is too disruptive to go to the stable branches. Yes, I do 
intend the first 2 patches to go into stable branches.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/5] locking/rwsem: Enable direct rwsem lock handoff
       [not found]     ` <20221018235138.1088-1-hdanton@sina.com>
@ 2022-10-19  0:39       ` Waiman Long
       [not found]       ` <20221019022934.1166-1-hdanton@sina.com>
  2022-10-24 16:18       ` Waiman Long
  2 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2022-10-19  0:39 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Mukesh Ojha, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, linux-kernel, john.p.donnelly,
	Ting11 Wang 王婷

On 10/18/22 19:51, Hillf Danton wrote:
> On 18 Oct 2022 13:37:20 -0400 Waiman Long <longman@redhat.com>
>> On 10/18/22 10:13, Mukesh Ojha wrote:
>>> On 10/18/2022 4:44 PM, Hillf Danton wrote:
>>>> On 17 Oct 2022 17:13:55 -0400 Waiman Long <longman@redhat.com>
>>>>> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore
>>>>>                return sem;
>>>>>            }
>>>>>            adjustment += RWSEM_FLAG_WAITERS;
>>>>> +    } else if ((count & RWSEM_FLAG_HANDOFF) &&
>>>>> +          ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) {
>>>> Could a couple of CPUs go read slow path in parallel?
>>>>
>> This is under wait_lock protection. So no parallel execution is possible.
> They individually add RWSEM_READER_BIAS to count before taking wait_lock,
> and the check for BIAS here does not cover the case of readers in parallel.
> Is this intended?
>
> Hillf

As I said in the patch description, the lock handoff can only be done if 
we can be sure that there is no other active locks outstanding with the 
handoff bit set. If at the time of the check, another reader come in and 
adds its RWSEM_READER_BIAS, the check fail and the cpu will proceed to 
put its waiter in the queue and begin sleeping. Hopefully, the last one 
left will find that count has only its RWSEM_READER_BIAS and it can 
start the handoff process.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/5] locking/rwsem: Enable direct rwsem lock handoff
       [not found]       ` <20221019022934.1166-1-hdanton@sina.com>
@ 2022-10-19  2:49         ` Waiman Long
       [not found]         ` <20221019070559.1220-1-hdanton@sina.com>
  1 sibling, 0 replies; 22+ messages in thread
From: Waiman Long @ 2022-10-19  2:49 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Mukesh Ojha, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, linux-kernel, john.p.donnelly,
	Ting11 Wang 王婷


On 10/18/22 22:29, Hillf Danton wrote:
> On 18 Oct 2022 20:39:59 -0400 Waiman Long <longman@redhat.com>
>> On 10/18/22 19:51, Hillf Danton wrote:
>>> On 18 Oct 2022 13:37:20 -0400 Waiman Long <longman@redhat.com>
>>>> On 10/18/22 10:13, Mukesh Ojha wrote:
>>>>> On 10/18/2022 4:44 PM, Hillf Danton wrote:
>>>>>> On 17 Oct 2022 17:13:55 -0400 Waiman Long <longman@redhat.com>
>>>>>>> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore
>>>>>>>                 return sem;
>>>>>>>             }
>>>>>>>             adjustment += RWSEM_FLAG_WAITERS;
>>>>>>> +    } else if ((count & RWSEM_FLAG_HANDOFF) &&
>>>>>>> +          ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) {
>>>>>> Could a couple of CPUs go read slow path in parallel?
>>>>>>
>>>> This is under wait_lock protection. So no parallel execution is possible.
>>> They individually add RWSEM_READER_BIAS to count before taking wait_lock,
>>> and the check for BIAS here does not cover the case of readers in parallel.
>>> Is this intended?
>>>
>>> Hillf
>> As I said in the patch description, the lock handoff can only be done if
>> we can be sure that there is no other active locks outstanding with the
>> handoff bit set. If at the time of the check, another reader come in and
>> adds its RWSEM_READER_BIAS, the check fail and the cpu will proceed to
>> put its waiter in the queue and begin sleeping. Hopefully, the last one
>> left will find that count has only its RWSEM_READER_BIAS and it can
>> start the handoff process.
> If handoff grants rwsem to a read waiter then the read fast path may revive.
I don't quite understand what you mean by "read fast path may revive".
> And at the time of the check, multiple readers do not break handoff IMO.

I am not saying that multiple readers will break handoff. They will just 
delay it until all their temporary RWSEM_READ_BIAS are taken off.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/5] locking/rwsem: Enable direct rwsem lock handoff
       [not found]         ` <20221019070559.1220-1-hdanton@sina.com>
@ 2022-10-19 15:02           ` Waiman Long
  0 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2022-10-19 15:02 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Mukesh Ojha, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, linux-kernel, john.p.donnelly,
	Ting11 Wang 王婷

On 10/19/22 03:05, Hillf Danton wrote:
> On 18 Oct 2022 22:49:02 -0400 Waiman Long <longman@redhat.com>
>> On 10/18/22 22:29, Hillf Danton wrote:
>>> If handoff grants rwsem to a read waiter then the read fast path may revive.
>> I don't quite understand what you mean by "read fast path may revive".
> Subsequent readers may take rwsem without going the slow path after a
> read waiter is granted.
That may not be true. As long as there are still waiters in the wait 
queue, a reader has to go into the slow path and queued up in the wait 
queue. This is is to prevent a continuous stream of readers from 
starving writers in the wait queue.
>
> OTOH even after the check for single BIAS, another reader may come in
> and add its BIAS to count, which builds the same count as multiple
> readers came in before the check.

That is true, but hopefully we will eventually run out reader trying to 
get a read lock on a given rwsem.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/5] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-10-17 21:13 ` [PATCH v3 1/5] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
@ 2022-10-24 13:17   ` Peter Zijlstra
  2022-10-24 13:50     ` Waiman Long
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2022-10-24 13:17 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel,
	john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷

On Mon, Oct 17, 2022 at 05:13:52PM -0400, Waiman Long wrote:
> A non-first waiter can potentially spin in the for loop of
> rwsem_down_write_slowpath() without sleeping but fail to acquire the
> lock even if the rwsem is free if the following sequence happens:
> 
>   Non-first waiter       First waiter      Lock holder
>   ----------------       ------------      -----------
>   Acquire wait_lock
>   rwsem_try_write_lock():
>     Set handoff bit if RT or
>       wait too long
>     Set waiter->handoff_set
>   Release wait_lock
>                          Acquire wait_lock
>                          Inherit waiter->handoff_set
>                          Release wait_lock
> 					   Clear owner
>                                            Release lock
>   if (waiter.handoff_set) {
>     rwsem_spin_on_owner(();
>     if (OWNER_NULL)
>       goto trylock_again;
>   }
>   trylock_again:
>   Acquire wait_lock
>   rwsem_try_write_lock():
>      if (first->handoff_set && (waiter != first))
>      	return false;
>   Release wait_lock
> 
> It is especially problematic if the non-first waiter is an RT task and
> it is running on the same CPU as the first waiter as this can lead to
> live lock.

I'm struggling to connect the Changelog to the actual patch. I see the
problem, but I don't see how the below helps or is even related to the
described problem.

>  kernel/locking/rwsem.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 44873594de03..be2df9ea7c30 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -624,18 +624,16 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
>  			 */
>  			if (first->handoff_set && (waiter != first))
>  				return false;
> -
> -			/*
> -			 * First waiter can inherit a previously set handoff
> -			 * bit and spin on rwsem if lock acquisition fails.
> -			 */
> -			if (waiter == first)
> -				waiter->handoff_set = true;
>  		}
>  
>  		new = count;
>  
>  		if (count & RWSEM_LOCK_MASK) {
> +			/*
> +			 * A waiter (first or not) can set the handoff bit
> +			 * if it is an RT task or wait in the wait queue
> +			 * for too long.
> +			 */
>  			if (has_handoff || (!rt_task(waiter->task) &&
>  					    !time_after(jiffies, waiter->timeout)))
>  				return false;
> @@ -651,11 +649,12 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
>  	} while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new));
>  
>  	/*
> -	 * We have either acquired the lock with handoff bit cleared or
> -	 * set the handoff bit.
> +	 * We have either acquired the lock with handoff bit cleared or set
> +	 * the handoff bit. Only the first waiter can have its handoff_set
> +	 * set here to enable optimistic spinning in slowpath loop.
>  	 */
>  	if (new & RWSEM_FLAG_HANDOFF) {
> -		waiter->handoff_set = true;
> +		first->handoff_set = true;
>  		lockevent_inc(rwsem_wlock_handoff);
>  		return false;
>  	}
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] locking/rwsem: Limit # of null owner retries for handoff writer
  2022-10-17 21:13 ` [PATCH v3 2/5] locking/rwsem: Limit # of null owner retries for handoff writer Waiman Long
@ 2022-10-24 13:31   ` Peter Zijlstra
  2022-10-24 15:55     ` Waiman Long
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2022-10-24 13:31 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel,
	john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷

On Mon, Oct 17, 2022 at 05:13:53PM -0400, Waiman Long wrote:
> Commit 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically
> spin on owner") assumes that when the owner field is changed to NULL,
> the lock will become free soon.  That assumption may not be correct
> especially if the handoff writer doing the spinning is a RT task which
> may preempt another task from completing its action of either freeing
> the rwsem or properly setting up owner.

I'm confused again -- rwsem_*_owner() has
lockdep_assert_preemption_disabled(). But more specifically; why can the
RT task preempt a lock-op like that?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/5] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-10-24 13:17   ` Peter Zijlstra
@ 2022-10-24 13:50     ` Waiman Long
  0 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2022-10-24 13:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel,
	john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷

On 10/24/22 09:17, Peter Zijlstra wrote:
> On Mon, Oct 17, 2022 at 05:13:52PM -0400, Waiman Long wrote:
>> A non-first waiter can potentially spin in the for loop of
>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>> lock even if the rwsem is free if the following sequence happens:
>>
>>    Non-first waiter       First waiter      Lock holder
>>    ----------------       ------------      -----------
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>      Set handoff bit if RT or
>>        wait too long
>>      Set waiter->handoff_set
>>    Release wait_lock
>>                           Acquire wait_lock
>>                           Inherit waiter->handoff_set
>>                           Release wait_lock
>> 					   Clear owner
>>                                             Release lock
>>    if (waiter.handoff_set) {
>>      rwsem_spin_on_owner(();
>>      if (OWNER_NULL)
>>        goto trylock_again;
>>    }
>>    trylock_again:
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>       if (first->handoff_set && (waiter != first))
>>       	return false;
>>    Release wait_lock
>>
>> It is especially problematic if the non-first waiter is an RT task and
>> it is running on the same CPU as the first waiter as this can lead to
>> live lock.
> I'm struggling to connect the Changelog to the actual patch. I see the
> problem, but I don't see how the below helps or is even related to the
> described problem.

Sorry if the description isn't clear, I will rephrase it to make it 
clearer. The basic idea is that a non-first waiter can mistakenly 
believe that it can spin on the lock. However, when 
rwsem_try_write_lock() is called, it can never acquire the lock and move 
on because it is not the first waiter:

                        if (first->handoff_set && (waiter != first))
                                 return false;

If that waiter happen to be an RT task, it can block the real first 
waiter to acquire the lock if it happen to run the same CPU.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] locking/rwsem: Limit # of null owner retries for handoff writer
  2022-10-24 13:31   ` Peter Zijlstra
@ 2022-10-24 15:55     ` Waiman Long
  2022-10-25 11:22       ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2022-10-24 15:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel,
	john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷

On 10/24/22 09:31, Peter Zijlstra wrote:
> On Mon, Oct 17, 2022 at 05:13:53PM -0400, Waiman Long wrote:
>> Commit 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically
>> spin on owner") assumes that when the owner field is changed to NULL,
>> the lock will become free soon.  That assumption may not be correct
>> especially if the handoff writer doing the spinning is a RT task which
>> may preempt another task from completing its action of either freeing
>> the rwsem or properly setting up owner.
> I'm confused again -- rwsem_*_owner() has
> lockdep_assert_preemption_disabled(). But more specifically; why can the
> RT task preempt a lock-op like that?

There is a special case raised by Mukesh that can happen. I quoted his 
text here:

---------------------------

Looks like, There is still a window for a race.

There is a chance when a reader who came first added it's BIAS and goes 
to slowpath and before it gets added to wait list it got preempted by RT 
task which  goes to slowpath as well and being the first waiter gets its 
hand-off bit set and not able to get the lock due to following condition 
in rwsem_try_write_lock()

  630                 if (count & RWSEM_LOCK_MASK) {  ==> reader has 
sets its bias
..
...

  634
  635                         new |= RWSEM_FLAG_HANDOFF;
  636                 } else {
  637                         new |= RWSEM_WRITER_LOCKED;

---------------------->----------------------->-------------------------

First reader (1)          writer(2) RT task             Lock holder(3)

It sets
RWSEM_READER_BIAS.
while it is going to
slowpath(as the lock
was held by (3)) and
before it got added
to the waiters list
it got preempted
by (2).
                         RT task also takes
                         the slowpath and add              release the
                         itself into waiting list          rwsem lock
             and since it is the first         clear the
                         it is the next one to get         owner.
                         the lock but it can not
                         get the lock as (count &
                         RWSEM_LOCK_MASK) is set
                         as (1) has added it but
                         not able to remove its
             adjustment.

----------------------

To fix that we either has to disable preemption in down_read() and 
reenable it in rwsem_down_read_slowpath after decrementing the 
RWSEM_READER_BIAS or to limit the number of trylock-spinning attempt 
like this patch. The latter approach seems a bit less messy and I am 
going to take it back out anyway in patch 4. I will put a summary of 
that special case in the patch description.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/5] locking/rwsem: Enable direct rwsem lock handoff
       [not found]     ` <20221018235138.1088-1-hdanton@sina.com>
  2022-10-19  0:39       ` Waiman Long
       [not found]       ` <20221019022934.1166-1-hdanton@sina.com>
@ 2022-10-24 16:18       ` Waiman Long
  2 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2022-10-24 16:18 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Mukesh Ojha, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, linux-kernel, john.p.donnelly,
	Ting11 Wang 王婷

On 10/18/22 19:51, Hillf Danton wrote:
> On 18 Oct 2022 13:37:20 -0400 Waiman Long<longman@redhat.com>
>> On 10/18/22 10:13, Mukesh Ojha wrote:
>>> On 10/18/2022 4:44 PM, Hillf Danton wrote:
>>>> On 17 Oct 2022 17:13:55 -0400 Waiman Long<longman@redhat.com>
>>>>> @@ -1067,13 +1119,33 @@ rwsem_down_read_slowpath(struct rw_semaphore
>>>>>                return sem;
>>>>>            }
>>>>>            adjustment += RWSEM_FLAG_WAITERS;
>>>>> +    } else if ((count & RWSEM_FLAG_HANDOFF) &&
>>>>> +          ((count & RWSEM_LOCK_MASK) == RWSEM_READER_BIAS)) {
>>>> Could a couple of CPUs go read slow path in parallel?
>>>>
>> This is under wait_lock protection. So no parallel execution is possible.
> They individually add RWSEM_READER_BIAS to count before taking wait_lock,
> and the check for BIAS here does not cover the case of readers in parallel.
> Is this intended?
>
> Hillf

As I said in the patch description, the lock handoff can only be done if 
we can be sure that there is no other active locks outstanding with the 
handoff bit set. If at the time of the check, another reader come in and 
adds its RWSEM_READER_BIAS, the check fail and the cpu will proceed to 
put its waiter in the queue and begin sleeping. Hopefully, the last one 
left will find that count has only its RWSEM_READER_BIAS and it can 
start the handoff process.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] locking/rwsem: Limit # of null owner retries for handoff writer
  2022-10-24 15:55     ` Waiman Long
@ 2022-10-25 11:22       ` Peter Zijlstra
  2022-10-25 11:48         ` Peter Zijlstra
       [not found]         ` <20221025145843.2953-1-hdanton@sina.com>
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2022-10-25 11:22 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel,
	john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷

On Mon, Oct 24, 2022 at 11:55:53AM -0400, Waiman Long wrote:

> Looks like, There is still a window for a race.
> 
> There is a chance when a reader who came first added it's BIAS and goes to
> slowpath and before it gets added to wait list it got preempted by RT task
> which  goes to slowpath as well and being the first waiter gets its hand-off
> bit set and not able to get the lock due to following condition in
> rwsem_try_write_lock()
> 
>  630                 if (count & RWSEM_LOCK_MASK) {  ==> reader has sets its
> bias
> ..
> ...
> 
>  634
>  635                         new |= RWSEM_FLAG_HANDOFF;
>  636                 } else {
>  637                         new |= RWSEM_WRITER_LOCKED;
> 
> 
> ---------------------->----------------------->-------------------------
> 
> First reader (1)          writer(2) RT task             Lock holder(3)
> 
> It sets
> RWSEM_READER_BIAS.
> while it is going to
> slowpath(as the lock
> was held by (3)) and
> before it got added
> to the waiters list
> it got preempted
> by (2).
>                         RT task also takes
>                         the slowpath and add              release the
>                         itself into waiting list          rwsem lock
>             and since it is the first         clear the
>                         it is the next one to get         owner.
>                         the lock but it can not
>                         get the lock as (count &
>                         RWSEM_LOCK_MASK) is set
>                         as (1) has added it but
>                         not able to remove its
>             adjustment.
> 
> ----------------------
> 
> To fix that we either has to disable preemption in down_read() and reenable
> it in rwsem_down_read_slowpath after decrementing the RWSEM_READER_BIAS or
> to limit the number of trylock-spinning attempt like this patch. The latter
> approach seems a bit less messy and I am going to take it back out anyway in
> patch 4. I will put a summary of that special case in the patch description.

Funny, I find the former approach much saner. Disabling preemption
around the whole thing fixes the fundamental problem while spin-limiting
is a band-aid.

Note how rwsem_write_trylock() already does preempt_disable(), having
the read-side do something similar only makes sense.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] locking/rwsem: Limit # of null owner retries for handoff writer
  2022-10-25 11:22       ` Peter Zijlstra
@ 2022-10-25 11:48         ` Peter Zijlstra
  2022-10-25 19:55           ` Waiman Long
       [not found]         ` <20221025145843.2953-1-hdanton@sina.com>
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2022-10-25 11:48 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel,
	john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷

On Tue, Oct 25, 2022 at 01:22:22PM +0200, Peter Zijlstra wrote:

> Funny, I find the former approach much saner. Disabling preemption
> around the whole thing fixes the fundamental problem while spin-limiting
> is a band-aid.
> 
> Note how rwsem_write_trylock() already does preempt_disable(), having
> the read-side do something similar only makes sense.

Something like the completely untested below perhaps...


diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 44873594de03..350fb004b0fb 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -256,16 +256,13 @@ static inline bool rwsem_read_trylock(struct rw_semaphore *sem, long *cntp)
 static inline bool rwsem_write_trylock(struct rw_semaphore *sem)
 {
 	long tmp = RWSEM_UNLOCKED_VALUE;
-	bool ret = false;
 
-	preempt_disable();
 	if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, RWSEM_WRITER_LOCKED)) {
 		rwsem_set_owner(sem);
-		ret = true;
+		return true;
 	}
 
-	preempt_enable();
-	return ret;
+	return false;
 }
 
 /*
@@ -717,7 +714,6 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 		return false;
 	}
 
-	preempt_disable();
 	/*
 	 * Disable preemption is equal to the RCU read-side crital section,
 	 * thus the task_strcut structure won't go away.
@@ -729,7 +725,6 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 	if ((flags & RWSEM_NONSPINNABLE) ||
 	    (owner && !(flags & RWSEM_READER_OWNED) && !owner_on_cpu(owner)))
 		ret = false;
-	preempt_enable();
 
 	lockevent_cond_inc(rwsem_opt_fail, !ret);
 	return ret;
@@ -829,8 +824,6 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 	int loop = 0;
 	u64 rspin_threshold = 0;
 
-	preempt_disable();
-
 	/* sem->wait_lock should not be held when doing optimistic spinning */
 	if (!osq_lock(&sem->osq))
 		goto done;
@@ -938,7 +931,6 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 	}
 	osq_unlock(&sem->osq);
 done:
-	preempt_enable();
 	lockevent_cond_inc(rwsem_opt_fail, !taken);
 	return taken;
 }
@@ -1092,7 +1084,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
 			/* Ordered by sem->wait_lock against rwsem_mark_wake(). */
 			break;
 		}
-		schedule();
+		schedule_preempt_disabled();
 		lockevent_inc(rwsem_sleep_reader);
 	}
 
@@ -1179,15 +1171,12 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 		if (waiter.handoff_set) {
 			enum owner_state owner_state;
 
-			preempt_disable();
 			owner_state = rwsem_spin_on_owner(sem);
-			preempt_enable();
-
 			if (owner_state == OWNER_NULL)
 				goto trylock_again;
 		}
 
-		schedule();
+		schedule_preempt_disabled();
 		lockevent_inc(rwsem_sleep_writer);
 		set_current_state(state);
 trylock_again:
@@ -1254,14 +1243,20 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
  */
 static inline int __down_read_common(struct rw_semaphore *sem, int state)
 {
+	int ret = 0;
 	long count;
 
+	preempt_disable();
 	if (!rwsem_read_trylock(sem, &count)) {
-		if (IS_ERR(rwsem_down_read_slowpath(sem, count, state)))
-			return -EINTR;
+		if (IS_ERR(rwsem_down_read_slowpath(sem, count, state))) {
+			ret = -EINTR;
+			goto out;
+		}
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	}
-	return 0;
+out:
+	preempt_enable();
+	return ret;
 }
 
 static inline void __down_read(struct rw_semaphore *sem)
@@ -1281,19 +1276,23 @@ static inline int __down_read_killable(struct rw_semaphore *sem)
 
 static inline int __down_read_trylock(struct rw_semaphore *sem)
 {
+	int ret = 0;
 	long tmp;
 
 	DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem);
 
+	preempt_disable();
 	tmp = atomic_long_read(&sem->count);
 	while (!(tmp & RWSEM_READ_FAILED_MASK)) {
 		if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp,
 						    tmp + RWSEM_READER_BIAS)) {
 			rwsem_set_reader_owned(sem);
-			return 1;
+			ret = 1;
+			break;
 		}
 	}
-	return 0;
+	preempt_enable();
+	return ret;
 }
 
 /*
@@ -1301,10 +1300,14 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
  */
 static inline int __down_write_common(struct rw_semaphore *sem, int state)
 {
+	int ret = 0;
+
+	preempt_disable();
 	if (unlikely(!rwsem_write_trylock(sem))) {
 		if (IS_ERR(rwsem_down_write_slowpath(sem, state)))
-			return -EINTR;
+			ret = -EINTR;
 	}
+	preempt_enable();
 
 	return 0;
 }
@@ -1321,8 +1324,14 @@ static inline int __down_write_killable(struct rw_semaphore *sem)
 
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
+	int ret;
+
+	preempt_disable();
 	DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem);
-	return rwsem_write_trylock(sem);
+	ret = rwsem_write_trylock(sem);
+	preempt_enable();
+
+	return ret;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] locking/rwsem: Limit # of null owner retries for handoff writer
       [not found]         ` <20221025145843.2953-1-hdanton@sina.com>
@ 2022-10-25 19:00           ` Waiman Long
  0 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2022-10-25 19:00 UTC (permalink / raw)
  To: Hillf Danton, Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel,
	john.p.donnelly, Mukesh Ojha, Ting11 Wang 王婷

On 10/25/22 10:58, Hillf Danton wrote:
>> @@ -1179,15 +1171,12 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
>>   		if (waiter.handoff_set) {
>>   			enum owner_state owner_state;
>>   
>> -			preempt_disable();
>>   			owner_state = rwsem_spin_on_owner(sem);
>> -			preempt_enable();
>> -
>>   			if (owner_state == OWNER_NULL)
>>   				goto trylock_again;
>>   		}
> 	__up_write()
> 	{
> 	rwsem_clear_owner(sem);
> 	/*
> 	 If lockup can happen when a bound kworker gets preempted here by
> 	 a FIFO acquirer for write, this is a case of preemption deeper
> 	 than thought IMO
> 	*/
> 	tmp = atomic_long_fetch_add_release(-RWSEM_WRITER_LOCKED, &sem->count);
> 	if (unlikely(tmp & RWSEM_FLAG_WAITERS))
> 		rwsem_wake(sem);
>
A preempt_disable()/preempt_enable() pair has been added by commit 
48dfb5d2560 ("locking/rwsem: Disable preemption while trying for rwsem 
lock") to __up_write(). So that should not be a problem. However, that 
does make this change, if implemented, has dependency on the coexistence 
of the previous mentioned commit to be functionally complete.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] locking/rwsem: Limit # of null owner retries for handoff writer
  2022-10-25 11:48         ` Peter Zijlstra
@ 2022-10-25 19:55           ` Waiman Long
  2022-10-25 20:14             ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2022-10-25 19:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel,
	john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷

On 10/25/22 07:48, Peter Zijlstra wrote:
> On Tue, Oct 25, 2022 at 01:22:22PM +0200, Peter Zijlstra wrote:
>
>> Funny, I find the former approach much saner. Disabling preemption
>> around the whole thing fixes the fundamental problem while spin-limiting
>> is a band-aid.
>>
>> Note how rwsem_write_trylock() already does preempt_disable(), having
>> the read-side do something similar only makes sense.
> Something like the completely untested below perhaps...

That is quite a number of changes spread over many different functions. 
That is the kind of changes that may make it harder to backport to 
stable releases.

This patch is just a stop-gap measure for stable releases which I 
essentially revert in a later patch. I have no objection to disable 
preemption in within the rwsem code exception to be backported to a 
stable release. So I can add another patch on top of the series to 
essentially do that.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] locking/rwsem: Limit # of null owner retries for handoff writer
  2022-10-25 19:55           ` Waiman Long
@ 2022-10-25 20:14             ` Peter Zijlstra
  2022-10-26  1:44               ` Waiman Long
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2022-10-25 20:14 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel,
	john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷

On Tue, Oct 25, 2022 at 03:55:09PM -0400, Waiman Long wrote:

> That is quite a number of changes spread over many different functions. That
> is the kind of changes that may make it harder to backport to stable
> releases.

Yeah; don't care. it's the right thing to do. It also doesn't need to be
reverted since it's a sane and good property for lock-ops to have.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] locking/rwsem: Limit # of null owner retries for handoff writer
  2022-10-25 20:14             ` Peter Zijlstra
@ 2022-10-26  1:44               ` Waiman Long
  0 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2022-10-26  1:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel,
	john.p.donnelly, Hillf Danton, Mukesh Ojha,
	Ting11 Wang 王婷

On 10/25/22 16:14, Peter Zijlstra wrote:
> On Tue, Oct 25, 2022 at 03:55:09PM -0400, Waiman Long wrote:
>
>> That is quite a number of changes spread over many different functions. That
>> is the kind of changes that may make it harder to backport to stable
>> releases.
> Yeah; don't care. it's the right thing to do. It also doesn't need to be
> reverted since it's a sane and good property for lock-ops to have.

I am sorry to confuse you. What I am saying is the original patch 2 of 
the series, not what you have proposed. I am fine about disabling 
preemption while running in the core rwsem code.

As a counter proposal, I will suggest modifying patch 2 to disable 
preemption for the reader rwsem code as the writer code already have 
preemption disabled in the most critical parts. Then I add another patch 
to complete the writer side conversion for symmetry. Are you OK with that?

Cheers,
Longman

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-10-26  1:44 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-17 21:13 [PATCH v3 0/5] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Waiman Long
2022-10-17 21:13 ` [PATCH v3 1/5] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
2022-10-24 13:17   ` Peter Zijlstra
2022-10-24 13:50     ` Waiman Long
2022-10-17 21:13 ` [PATCH v3 2/5] locking/rwsem: Limit # of null owner retries for handoff writer Waiman Long
2022-10-24 13:31   ` Peter Zijlstra
2022-10-24 15:55     ` Waiman Long
2022-10-25 11:22       ` Peter Zijlstra
2022-10-25 11:48         ` Peter Zijlstra
2022-10-25 19:55           ` Waiman Long
2022-10-25 20:14             ` Peter Zijlstra
2022-10-26  1:44               ` Waiman Long
     [not found]         ` <20221025145843.2953-1-hdanton@sina.com>
2022-10-25 19:00           ` Waiman Long
2022-10-17 21:13 ` [PATCH v3 3/5] locking/rwsem: Change waiter->hanodff_set to a handoff_state enum Waiman Long
2022-10-17 21:13 ` [PATCH v3 4/5] locking/rwsem: Enable direct rwsem lock handoff Waiman Long
2022-10-17 21:13 ` [PATCH v3 5/5] locking/rwsem: Update handoff lock events tracking Waiman Long
     [not found] ` <20221018111424.1007-1-hdanton@sina.com>
2022-10-18 14:13   ` [PATCH v3 4/5] locking/rwsem: Enable direct rwsem lock handoff Mukesh Ojha
2022-10-18 17:37     ` Waiman Long
     [not found]     ` <20221018235138.1088-1-hdanton@sina.com>
2022-10-19  0:39       ` Waiman Long
     [not found]       ` <20221019022934.1166-1-hdanton@sina.com>
2022-10-19  2:49         ` Waiman Long
     [not found]         ` <20221019070559.1220-1-hdanton@sina.com>
2022-10-19 15:02           ` Waiman Long
2022-10-24 16:18       ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.