All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] locking/rwsem: Add upgrade_read()
@ 2021-09-22 19:36 Waiman Long
  2021-09-22 20:06 ` Peter Zijlstra
  2021-09-23  1:16 ` Boqun Feng
  0 siblings, 2 replies; 5+ messages in thread
From: Waiman Long @ 2021-09-22 19:36 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, Davidlohr Bueso, Waiman Long

Currently there are about 12 instances in the kernel where an up_read()
is immediately followed by a down_write() of the same lock. For example,

  drivers/tty/n_tty.c:		up_read(&tty->termios_rwsem);
  drivers/tty/n_tty.c-		down_write(&tty->termios_rwsem);

Since we have already provided a downgrade_write() function, we may as
well provide an upgrade_read() function to make the code easier to read
and the intention clearer.

If the current task is the only reader, the upgrade can be done by a
single atomic operation. If not, the upgrade will have to be done by a
separate up_read() call followed by a down_write(). In the former case,
the handoff bit is not considered and the waiter will have to wait a
bit longer to acquire the lock.

The new upgrade_read() function returns a value of 0 for safe upgrade
where rwsem protected data won't change. Otherwise a value of 1 is
returned to indicate unsafe upgrade where rwsem protected data may
change during the upgrade process.

For PREEMPT_RT, it falls back to up_read() followed by down_write()
for simplicity.

Some uses of down_write() with long lock hold time may be changed
to the following format in the future:

	down_read()
	/* check data */
	if (upgrade_read()) {
		/* unsafe upgrade, recheck data */
	}
	/* update data */
	up_write();

As long as the "recheck data" and "update data" parts are relatively
short compared with the "check data" part, this conversion may help to
improve parallelism and reduce lock contention.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/rwsem.h  |  5 ++++
 kernel/locking/rwsem.c | 53 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 352c6127cb90..8ece58224f25 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -207,6 +207,11 @@ extern void up_write(struct rw_semaphore *sem);
  */
 extern void downgrade_write(struct rw_semaphore *sem);
 
+/*
+ * upgrade read lock to write lock
+ */
+extern int upgrade_read(struct rw_semaphore *sem);
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 /*
  * nested locking. NOTE: rwsems are not allowed to recurse
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 000e8d5a2884..aeb5b0668304 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1203,6 +1203,29 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
 	return sem;
 }
 
+/*
+ * Try to upgrade read lock to write lock
+ */
+static inline int __try_upgrade_read(struct rw_semaphore *sem)
+{
+	long count = atomic_long_read(&sem->count);
+
+	WARN_ON_ONCE(count & RWSEM_WRITER_LOCKED);
+
+	/*
+	 * When upgrading from shared to exclusive ownership,
+	 * anything inside the write-locked region cannot leak
+	 * into the read side. Use an ACQUIRE semantics.
+	 */
+	if (((count & RWSEM_READER_MASK) == RWSEM_READER_BIAS) &&
+	     atomic_long_try_cmpxchg_acquire(&sem->count, &count,
+			count - RWSEM_READER_BIAS + RWSEM_WRITER_LOCKED)) {
+		rwsem_set_owner(sem);
+		return 1;
+	}
+	return 0;
+}
+
 /*
  * lock for reading
  */
@@ -1438,6 +1461,11 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 	rwbase_write_downgrade(&sem->rwbase);
 }
 
+static inline int __try_upgrade_read(struct rw_semaphore *sem)
+{
+	return 0;
+}
+
 /* Debug stubs for the common API */
 #define DEBUG_RWSEMS_WARN_ON(c, sem)
 
@@ -1581,6 +1609,31 @@ void downgrade_write(struct rw_semaphore *sem)
 }
 EXPORT_SYMBOL(downgrade_write);
 
+/*
+ * Upgrade read lock to write lock
+ *
+ * Return: 0 when upgrade is safe, i.e. rwsem protected data do not change;
+ *         1 when upgrade is unsafe as rwsem protected data may have changed.
+ */
+int upgrade_read(struct rw_semaphore *sem)
+{
+	if (__try_upgrade_read(sem)) {
+		rwsem_release(&sem->dep_map, _RET_IP_);
+		rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
+		return 0;
+	}
+
+	/*
+	 * We cannot directly upgrade to the write lock, just do a regular
+	 * up_read() and down_write() sequence. The data protected by the
+	 * rwsem may have changed before the write lock is acquired.
+	 */
+	down_read(sem);
+	up_write(sem);
+	return 1;
+}
+EXPORT_SYMBOL(upgrade_read);
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 
 void down_read_nested(struct rw_semaphore *sem, int subclass)
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] locking/rwsem: Add upgrade_read()
  2021-09-22 19:36 [RFC PATCH] locking/rwsem: Add upgrade_read() Waiman Long
@ 2021-09-22 20:06 ` Peter Zijlstra
  2021-09-22 20:19   ` Waiman Long
  2021-09-23  1:16 ` Boqun Feng
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2021-09-22 20:06 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel, Davidlohr Bueso

On Wed, Sep 22, 2021 at 03:36:57PM -0400, Waiman Long wrote:
> Currently there are about 12 instances in the kernel where an up_read()
> is immediately followed by a down_write() of the same lock. For example,
> 
>   drivers/tty/n_tty.c:		up_read(&tty->termios_rwsem);
>   drivers/tty/n_tty.c-		down_write(&tty->termios_rwsem);

And TTY is a high performance issue, that requires hacks like this?

> Since we have already provided a downgrade_write() function, we may as
> well provide an upgrade_read() function to make the code easier to read
> and the intention clearer.
> 
> If the current task is the only reader, the upgrade can be done by a
> single atomic operation. If not, the upgrade will have to be done by a
> separate up_read() call followed by a down_write(). In the former case,
> the handoff bit is not considered and the waiter will have to wait a
> bit longer to acquire the lock.
> 
> The new upgrade_read() function returns a value of 0 for safe upgrade
> where rwsem protected data won't change. Otherwise a value of 1 is
> returned to indicate unsafe upgrade where rwsem protected data may
> change during the upgrade process.

Yuck...

Is there any workload where this is a massive win? I'm thinking that
either the lock is contended and you get the unsafe option which is the
same as today, or the lock isn't contended and you would've gotten
fast-paths and you barely safe anything anyway.

Also, -ENODATA

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] locking/rwsem: Add upgrade_read()
  2021-09-22 20:06 ` Peter Zijlstra
@ 2021-09-22 20:19   ` Waiman Long
  0 siblings, 0 replies; 5+ messages in thread
From: Waiman Long @ 2021-09-22 20:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel, Davidlohr Bueso

On 9/22/21 4:06 PM, Peter Zijlstra wrote:
> On Wed, Sep 22, 2021 at 03:36:57PM -0400, Waiman Long wrote:
>> Currently there are about 12 instances in the kernel where an up_read()
>> is immediately followed by a down_write() of the same lock. For example,
>>
>>    drivers/tty/n_tty.c:		up_read(&tty->termios_rwsem);
>>    drivers/tty/n_tty.c-		down_write(&tty->termios_rwsem);
> And TTY is a high performance issue, that requires hacks like this?
I won't call it a hack. I consider it a better documentation what the 
real intention is.
>
>> Since we have already provided a downgrade_write() function, we may as
>> well provide an upgrade_read() function to make the code easier to read
>> and the intention clearer.
>>
>> If the current task is the only reader, the upgrade can be done by a
>> single atomic operation. If not, the upgrade will have to be done by a
>> separate up_read() call followed by a down_write(). In the former case,
>> the handoff bit is not considered and the waiter will have to wait a
>> bit longer to acquire the lock.
>>
>> The new upgrade_read() function returns a value of 0 for safe upgrade
>> where rwsem protected data won't change. Otherwise a value of 1 is
>> returned to indicate unsafe upgrade where rwsem protected data may
>> change during the upgrade process.
> Yuck...
>
> Is there any workload where this is a massive win? I'm thinking that
> either the lock is contended and you get the unsafe option which is the
> same as today, or the lock isn't contended and you would've gotten
> fast-paths and you barely safe anything anyway.
>
> Also, -ENODATA
>
Yes, the best case saving is just just an atomic op.

This is just a RFC. I can look for workloads that can benefit from using 
this.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] locking/rwsem: Add upgrade_read()
  2021-09-22 19:36 [RFC PATCH] locking/rwsem: Add upgrade_read() Waiman Long
  2021-09-22 20:06 ` Peter Zijlstra
@ 2021-09-23  1:16 ` Boqun Feng
  2021-09-23  1:45   ` Waiman Long
  1 sibling, 1 reply; 5+ messages in thread
From: Boqun Feng @ 2021-09-23  1:16 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Davidlohr Bueso

On Wed, Sep 22, 2021 at 03:36:57PM -0400, Waiman Long wrote:
> Currently there are about 12 instances in the kernel where an up_read()
> is immediately followed by a down_write() of the same lock. For example,
> 
>   drivers/tty/n_tty.c:		up_read(&tty->termios_rwsem);
>   drivers/tty/n_tty.c-		down_write(&tty->termios_rwsem);
> 
> Since we have already provided a downgrade_write() function, we may as
> well provide an upgrade_read() function to make the code easier to read
> and the intention clearer.
> 
> If the current task is the only reader, the upgrade can be done by a
> single atomic operation. If not, the upgrade will have to be done by a
> separate up_read() call followed by a down_write(). In the former case,
> the handoff bit is not considered and the waiter will have to wait a
> bit longer to acquire the lock.
> 
> The new upgrade_read() function returns a value of 0 for safe upgrade
> where rwsem protected data won't change. Otherwise a value of 1 is
> returned to indicate unsafe upgrade where rwsem protected data may
> change during the upgrade process.
> 
> For PREEMPT_RT, it falls back to up_read() followed by down_write()
> for simplicity.
> 
> Some uses of down_write() with long lock hold time may be changed
> to the following format in the future:
> 
> 	down_read()
> 	/* check data */
> 	if (upgrade_read()) {
> 		/* unsafe upgrade, recheck data */
> 	}
> 	/* update data */
> 	up_write();
> 
> As long as the "recheck data" and "update data" parts are relatively
> short compared with the "check data" part, this conversion may help to
> improve parallelism and reduce lock contention.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  include/linux/rwsem.h  |  5 ++++
>  kernel/locking/rwsem.c | 53 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 58 insertions(+)
> 
> diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
> index 352c6127cb90..8ece58224f25 100644
> --- a/include/linux/rwsem.h
> +++ b/include/linux/rwsem.h
> @@ -207,6 +207,11 @@ extern void up_write(struct rw_semaphore *sem);
>   */
>  extern void downgrade_write(struct rw_semaphore *sem);
>  
> +/*
> + * upgrade read lock to write lock
> + */
> +extern int upgrade_read(struct rw_semaphore *sem);
> +
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  /*
>   * nested locking. NOTE: rwsems are not allowed to recurse
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 000e8d5a2884..aeb5b0668304 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -1203,6 +1203,29 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
>  	return sem;
>  }
>  
> +/*
> + * Try to upgrade read lock to write lock
> + */
> +static inline int __try_upgrade_read(struct rw_semaphore *sem)
> +{
> +	long count = atomic_long_read(&sem->count);
> +
> +	WARN_ON_ONCE(count & RWSEM_WRITER_LOCKED);
> +
> +	/*
> +	 * When upgrading from shared to exclusive ownership,
> +	 * anything inside the write-locked region cannot leak
> +	 * into the read side. Use an ACQUIRE semantics.
> +	 */
> +	if (((count & RWSEM_READER_MASK) == RWSEM_READER_BIAS) &&
> +	     atomic_long_try_cmpxchg_acquire(&sem->count, &count,
> +			count - RWSEM_READER_BIAS + RWSEM_WRITER_LOCKED)) {
> +		rwsem_set_owner(sem);
> +		return 1;
> +	}
> +	return 0;
> +}
> +
>  /*
>   * lock for reading
>   */
> @@ -1438,6 +1461,11 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
>  	rwbase_write_downgrade(&sem->rwbase);
>  }
>  
> +static inline int __try_upgrade_read(struct rw_semaphore *sem)
> +{
> +	return 0;
> +}
> +
>  /* Debug stubs for the common API */
>  #define DEBUG_RWSEMS_WARN_ON(c, sem)
>  
> @@ -1581,6 +1609,31 @@ void downgrade_write(struct rw_semaphore *sem)
>  }
>  EXPORT_SYMBOL(downgrade_write);
>  
> +/*
> + * Upgrade read lock to write lock
> + *
> + * Return: 0 when upgrade is safe, i.e. rwsem protected data do not change;
> + *         1 when upgrade is unsafe as rwsem protected data may have changed.
> + */
> +int upgrade_read(struct rw_semaphore *sem)
> +{
> +	if (__try_upgrade_read(sem)) {
> +		rwsem_release(&sem->dep_map, _RET_IP_);
> +		rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
> +		return 0;
> +	}
> +
> +	/*
> +	 * We cannot directly upgrade to the write lock, just do a regular
> +	 * up_read() and down_write() sequence. The data protected by the
> +	 * rwsem may have changed before the write lock is acquired.
> +	 */
> +	down_read(sem);
> +	up_write(sem);

Confused, the comment says up_read()+down_write(), however the code is
down_read()+up_write().

Besides, I don't like the idea that the value may have changed before
the write lock is acquired if we call it "upgrade". Maybe we want api
like down_read_upgradable(), which can be held in parallel with other
down_read() but no other down_read_upgradable(), and one can only
upgrade the read-side critical section created by
down_read_upgradable(). For implementation, that means we need to have
one extra bit for upgradable. Thoughts?

Regards,
Boqun

> +	return 1;
> +}
> +EXPORT_SYMBOL(upgrade_read);
> +
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  
>  void down_read_nested(struct rw_semaphore *sem, int subclass)
> -- 
> 2.18.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] locking/rwsem: Add upgrade_read()
  2021-09-23  1:16 ` Boqun Feng
@ 2021-09-23  1:45   ` Waiman Long
  0 siblings, 0 replies; 5+ messages in thread
From: Waiman Long @ 2021-09-23  1:45 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Davidlohr Bueso

On 9/22/21 9:16 PM, Boqun Feng wrote:
> On Wed, Sep 22, 2021 at 03:36:57PM -0400, Waiman Long wrote:
>> Currently there are about 12 instances in the kernel where an up_read()
>> is immediately followed by a down_write() of the same lock. For example,
>>
>>    drivers/tty/n_tty.c:		up_read(&tty->termios_rwsem);
>>    drivers/tty/n_tty.c-		down_write(&tty->termios_rwsem);
>>
>> Since we have already provided a downgrade_write() function, we may as
>> well provide an upgrade_read() function to make the code easier to read
>> and the intention clearer.
>>
>> If the current task is the only reader, the upgrade can be done by a
>> single atomic operation. If not, the upgrade will have to be done by a
>> separate up_read() call followed by a down_write(). In the former case,
>> the handoff bit is not considered and the waiter will have to wait a
>> bit longer to acquire the lock.
>>
>> The new upgrade_read() function returns a value of 0 for safe upgrade
>> where rwsem protected data won't change. Otherwise a value of 1 is
>> returned to indicate unsafe upgrade where rwsem protected data may
>> change during the upgrade process.
>>
>> For PREEMPT_RT, it falls back to up_read() followed by down_write()
>> for simplicity.
>>
>> Some uses of down_write() with long lock hold time may be changed
>> to the following format in the future:
>>
>> 	down_read()
>> 	/* check data */
>> 	if (upgrade_read()) {
>> 		/* unsafe upgrade, recheck data */
>> 	}
>> 	/* update data */
>> 	up_write();
>>
>> As long as the "recheck data" and "update data" parts are relatively
>> short compared with the "check data" part, this conversion may help to
>> improve parallelism and reduce lock contention.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   include/linux/rwsem.h  |  5 ++++
>>   kernel/locking/rwsem.c | 53 ++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 58 insertions(+)
>>
>> diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
>> index 352c6127cb90..8ece58224f25 100644
>> --- a/include/linux/rwsem.h
>> +++ b/include/linux/rwsem.h
>> @@ -207,6 +207,11 @@ extern void up_write(struct rw_semaphore *sem);
>>    */
>>   extern void downgrade_write(struct rw_semaphore *sem);
>>   
>> +/*
>> + * upgrade read lock to write lock
>> + */
>> +extern int upgrade_read(struct rw_semaphore *sem);
>> +
>>   #ifdef CONFIG_DEBUG_LOCK_ALLOC
>>   /*
>>    * nested locking. NOTE: rwsems are not allowed to recurse
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index 000e8d5a2884..aeb5b0668304 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -1203,6 +1203,29 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
>>   	return sem;
>>   }
>>   
>> +/*
>> + * Try to upgrade read lock to write lock
>> + */
>> +static inline int __try_upgrade_read(struct rw_semaphore *sem)
>> +{
>> +	long count = atomic_long_read(&sem->count);
>> +
>> +	WARN_ON_ONCE(count & RWSEM_WRITER_LOCKED);
>> +
>> +	/*
>> +	 * When upgrading from shared to exclusive ownership,
>> +	 * anything inside the write-locked region cannot leak
>> +	 * into the read side. Use an ACQUIRE semantics.
>> +	 */
>> +	if (((count & RWSEM_READER_MASK) == RWSEM_READER_BIAS) &&
>> +	     atomic_long_try_cmpxchg_acquire(&sem->count, &count,
>> +			count - RWSEM_READER_BIAS + RWSEM_WRITER_LOCKED)) {
>> +		rwsem_set_owner(sem);
>> +		return 1;
>> +	}
>> +	return 0;
>> +}
>> +
>>   /*
>>    * lock for reading
>>    */
>> @@ -1438,6 +1461,11 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
>>   	rwbase_write_downgrade(&sem->rwbase);
>>   }
>>   
>> +static inline int __try_upgrade_read(struct rw_semaphore *sem)
>> +{
>> +	return 0;
>> +}
>> +
>>   /* Debug stubs for the common API */
>>   #define DEBUG_RWSEMS_WARN_ON(c, sem)
>>   
>> @@ -1581,6 +1609,31 @@ void downgrade_write(struct rw_semaphore *sem)
>>   }
>>   EXPORT_SYMBOL(downgrade_write);
>>   
>> +/*
>> + * Upgrade read lock to write lock
>> + *
>> + * Return: 0 when upgrade is safe, i.e. rwsem protected data do not change;
>> + *         1 when upgrade is unsafe as rwsem protected data may have changed.
>> + */
>> +int upgrade_read(struct rw_semaphore *sem)
>> +{
>> +	if (__try_upgrade_read(sem)) {
>> +		rwsem_release(&sem->dep_map, _RET_IP_);
>> +		rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
>> +		return 0;
>> +	}
>> +
>> +	/*
>> +	 * We cannot directly upgrade to the write lock, just do a regular
>> +	 * up_read() and down_write() sequence. The data protected by the
>> +	 * rwsem may have changed before the write lock is acquired.
>> +	 */
>> +	down_read(sem);
>> +	up_write(sem);
> Confused, the comment says up_read()+down_write(), however the code is
> down_read()+up_write().
Thanks for catching that typo. My bad.
>
> Besides, I don't like the idea that the value may have changed before
> the write lock is acquired if we call it "upgrade". Maybe we want api
> like down_read_upgradable(), which can be held in parallel with other
> down_read() but no other down_read_upgradable(), and one can only
> upgrade the read-side critical section created by
> down_read_upgradable(). For implementation, that means we need to have
> one extra bit for upgradable. Thoughts?

I like your idea. There are spare bits available and we can dedicate one 
bit for that purpose. After successfully acquire the bit the reader can 
probably spin a little bit and then insert itself to the head of the 
wait queue to sleep. The last exiting reader can wake it up to acquire 
the write lock.

I will probably use "try_upgrade_read() to indicate that the attempt may 
fail.

Thanks for the suggestion.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-09-23  1:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22 19:36 [RFC PATCH] locking/rwsem: Add upgrade_read() Waiman Long
2021-09-22 20:06 ` Peter Zijlstra
2021-09-22 20:19   ` Waiman Long
2021-09-23  1:16 ` Boqun Feng
2021-09-23  1:45   ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.