linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
@ 2016-05-07  0:20 Waiman Long
  2016-05-07  4:56 ` Ingo Molnar
                   ` (5 more replies)
  0 siblings, 6 replies; 33+ messages in thread
From: Waiman Long @ 2016-05-07  0:20 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Davidlohr Bueso, Jason Low, Dave Chinner,
	Scott J Norton, Douglas Hatch, Waiman Long

Currently, it is not possible to determine for sure if a reader
owns a rwsem by looking at the content of the rwsem data structure.
This patch adds a new state RWSEM_READER_OWNED to the owner field
to indicate that readers currently own the lock. This enables us to
address the following 2 issues in the rwsem optimistic spinning code:

 1) rwsem_can_spin_on_owner() will disallow optimistic spinning if
    the owner field is NULL which can mean either the readers own
    the lock or the owning writer hasn't set the owner field yet.
    In the latter case, we miss the chance to do optimistic spinning.

 2) While a writer is spinning and a reader takes the lock, the writer
    will continue to spin in the main rwsem_optimistic_spin() loop as
    the owner is NULL.

Adding the new state will allow optimistic spinning to go forward as
long as the owner field is not RWSEM_READER_OWNED and the owner is
running, if set, but stop immediately when that state has been reached.

On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
fio test with multithreaded randrw and randwrite tests on the same
file on a XFS partition on top of a NVDIMM were run, the aggregated
bandwidths before and after the patch were as follows:

  Test      BW before patch     BW after patch  % change
  ----      ---------------     --------------  --------
  randrw         988 MB/s          1192 MB/s      +21%
  randwrite     1513 MB/s          1623 MB/s      +7.3%

The perf profile of the rwsem_down_write_failed() function in randrw
before and after the patch were:

   19.95%  5.88%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
   14.20%  1.52%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed

The actual CPU cycles spend in rwsem_down_write_failed() dropped from
5.88% to 1.52% after the patch.

The xfstests was also run and no regression was observed.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 v1->v2:
  - Add rwsem_is_reader_owned() helper & rename rwsem_reader_owned()
    to rwsem_set_reader_owned().
  - Add more comments to clarify the purpose of some of the code
    changes.

 kernel/locking/rwsem-xadd.c |   39 ++++++++++++++++++---------------------
 kernel/locking/rwsem.c      |    8 ++++++--
 kernel/locking/rwsem.h      |   41 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 65 insertions(+), 23 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index df4dcb8..620a286 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -155,6 +155,12 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 			/* Last active locker left. Retry waking readers. */
 			goto try_reader_grant;
 		}
+		/*
+		 * It is not really necessary to set it to reader-owned here,
+		 * but it gives the spinners an early indication that the
+		 * readers now have the lock.
+		 */
+		rwsem_set_reader_owned(sem);
 	}
 
 	/* Grant an infinite number of read locks to the readers at the front
@@ -306,16 +312,11 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 
 	rcu_read_lock();
 	owner = READ_ONCE(sem->owner);
-	if (!owner) {
-		long count = READ_ONCE(sem->count);
+	if (!rwsem_is_writer_owned(owner)) {
 		/*
-		 * If sem->owner is not set, yet we have just recently entered the
-		 * slowpath with the lock being active, then there is a possibility
-		 * reader(s) may have the lock. To be safe, bail spinning in these
-		 * situations.
+		 * Don't spin if the rwsem is readers owned.
 		 */
-		if (count & RWSEM_ACTIVE_MASK)
-			ret = false;
+		ret = !rwsem_is_reader_owned(owner);
 		goto done;
 	}
 
@@ -328,8 +329,6 @@ done:
 static noinline
 bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 {
-	long count;
-
 	rcu_read_lock();
 	while (sem->owner == owner) {
 		/*
@@ -350,16 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 	}
 	rcu_read_unlock();
 
-	if (READ_ONCE(sem->owner))
-		return true; /* new owner, continue spinning */
-
 	/*
-	 * When the owner is not set, the lock could be free or
-	 * held by readers. Check the counter to verify the
-	 * state.
+	 * If there is a new owner or the owner is not set, we continue
+	 * spinning.
 	 */
-	count = READ_ONCE(sem->count);
-	return (count == 0 || count == RWSEM_WAITING_BIAS);
+	return !rwsem_is_reader_owned(READ_ONCE(sem->owner));
 }
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
@@ -378,7 +372,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 
 	while (true) {
 		owner = READ_ONCE(sem->owner);
-		if (owner && !rwsem_spin_on_owner(sem, owner))
+		if (rwsem_is_writer_owned(owner) &&
+		   !rwsem_spin_on_owner(sem, owner))
 			break;
 
 		/* wait_lock will be acquired if write_lock is obtained */
@@ -391,9 +386,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		 * When there's no owner, we might have preempted between the
 		 * owner acquiring the lock and setting the owner field. If
 		 * we're an RT task that will live-lock because we won't let
-		 * the owner complete.
+		 * the owner complete. We also quit if the lock is owned by
+		 * readers.
 		 */
-		if (!owner && (need_resched() || rt_task(current)))
+		if (rwsem_is_reader_owned(owner) ||
+		   (!owner && (need_resched() || rt_task(current))))
 			break;
 
 		/*
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index c817216..5838f56 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -22,6 +22,7 @@ void __sched down_read(struct rw_semaphore *sem)
 	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
+	rwsem_set_reader_owned(sem);
 }
 
 EXPORT_SYMBOL(down_read);
@@ -33,8 +34,10 @@ int down_read_trylock(struct rw_semaphore *sem)
 {
 	int ret = __down_read_trylock(sem);
 
-	if (ret == 1)
+	if (ret == 1) {
 		rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_);
+		rwsem_set_reader_owned(sem);
+	}
 	return ret;
 }
 
@@ -124,7 +127,7 @@ void downgrade_write(struct rw_semaphore *sem)
 	 * lockdep: a downgraded write will live on as a write
 	 * dependency.
 	 */
-	rwsem_clear_owner(sem);
+	rwsem_set_reader_owned(sem);
 	__downgrade_write(sem);
 }
 
@@ -138,6 +141,7 @@ void down_read_nested(struct rw_semaphore *sem, int subclass)
 	rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
+	rwsem_set_reader_owned(sem);
 }
 
 EXPORT_SYMBOL(down_read_nested);
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index 870ed9a..d7fea18 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -1,3 +1,20 @@
+/*
+ * The owner field of the rw_semaphore structure will be set to
+ * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
+ * the owner field when it unlocks. A reader, on the other hand, will
+ * not touch the owner field when it unlocks.
+ *
+ * In essence, the owner field now has the following 3 states:
+ *  1) 0
+ *     - lock is free or the owner hasn't set the field yet
+ *  2) RWSEM_READER_OWNED
+ *     - lock is currently or previously owned by readers (lock is free
+ *       or not set by owner yet)
+ *  3) Other non-zero value
+ *     - a writer owns the lock
+ */
+#define RWSEM_READER_OWNED	1UL
+
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
 {
@@ -9,6 +26,26 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 	sem->owner = NULL;
 }
 
+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+	/*
+	 * We check the owner value first to make sure that we will only
+	 * do a write to the rwsem cacheline when it is really necessary
+	 * to minimize cacheline contention.
+	 */
+	if (sem->owner != (struct task_struct *)RWSEM_READER_OWNED)
+		sem->owner = (struct task_struct *)RWSEM_READER_OWNED;
+}
+
+static inline bool rwsem_is_writer_owned(struct task_struct *owner)
+{
+	return (unsigned long)owner > RWSEM_READER_OWNED;
+}
+
+static inline bool rwsem_is_reader_owned(struct task_struct *owner)
+{
+	return owner == (struct task_struct *)RWSEM_READER_OWNED;
+}
 #else
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
 {
@@ -17,4 +54,8 @@ static inline void rwsem_set_owner(struct rw_semaphore *sem)
 static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 {
 }
+
+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+}
 #endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-07  0:20 [PATCH v2] locking/rwsem: Add reader-owned state to the owner field Waiman Long
@ 2016-05-07  4:56 ` Ingo Molnar
  2016-05-08  3:04   ` Waiman Long
  2016-05-09  8:27 ` Peter Zijlstra
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Ingo Molnar @ 2016-05-07  4:56 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Dave Chinner, Scott J Norton, Douglas Hatch


* Waiman Long <Waiman.Long@hpe.com> wrote:

> On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
> fio test with multithreaded randrw and randwrite tests on the same
> file on a XFS partition on top of a NVDIMM were run, the aggregated
> bandwidths before and after the patch were as follows:
> 
>   Test      BW before patch     BW after patch  % change
>   ----      ---------------     --------------  --------
>   randrw         988 MB/s          1192 MB/s      +21%
>   randwrite     1513 MB/s          1623 MB/s      +7.3%

What testcase/suite is this? I'd like to run this on other machines as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-07  4:56 ` Ingo Molnar
@ 2016-05-08  3:04   ` Waiman Long
  0 siblings, 0 replies; 33+ messages in thread
From: Waiman Long @ 2016-05-08  3:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Dave Chinner, Scott J Norton, Douglas Hatch

[-- Attachment #1: Type: text/plain, Size: 882 bytes --]

On 05/07/2016 12:56 AM, Ingo Molnar wrote:
> * Waiman Long<Waiman.Long@hpe.com>  wrote:
>
>> On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
>> fio test with multithreaded randrw and randwrite tests on the same
>> file on a XFS partition on top of a NVDIMM were run, the aggregated
>> bandwidths before and after the patch were as follows:
>>
>>    Test      BW before patch     BW after patch  % change
>>    ----      ---------------     --------------  --------
>>    randrw         988 MB/s          1192 MB/s      +21%
>>    randwrite     1513 MB/s          1623 MB/s      +7.3%
> What testcase/suite is this? I'd like to run this on other machines as well.
>
> Thanks,
>
> 	Ingo

I just used fio on a nvdimm based xfs filesystem. It is essentially like 
a ramfs filesystem in term of performance. Attached were config files 
that I used.

Cheers,
Longman

[-- Attachment #2: rw.fio --]
[-- Type: text/plain, Size: 420 bytes --]

[global]
direct=1
ioengine=libaio
norandommap
randrepeat=0
bs=4K
size=1G
iodepth=1	# pmem has no queue depth
runtime=30
time_based=1
group_reporting
thread=1
gtod_reduce=1	# reduce=1 except for latency test
gtod_cpu=1


## cross-CPU combinations
numjobs=18	
cpus_allowed=0-39

cpus_allowed_policy=split

[drive_0]
filename=/mnt/fio
cpus_allowed=0-17
rw=randrw

[drive_1]
filename=/mnt/fio2
cpus_allowed=18-35
rw=randrw


[-- Attachment #3: wo.fio --]
[-- Type: text/plain, Size: 426 bytes --]

[global]
direct=1
ioengine=libaio
norandommap
randrepeat=0
bs=4K
size=1G
iodepth=1	# pmem has no queue depth
runtime=30
time_based=1
group_reporting
thread=1
gtod_reduce=1	# reduce=1 except for latency test
gtod_cpu=1


## cross-CPU combinations
numjobs=18	
cpus_allowed=0-39

cpus_allowed_policy=split

[drive_0]
filename=/mnt/fio
cpus_allowed=0-17
rw=randwrite

[drive_1]
filename=/mnt/fio2
cpus_allowed=18-35
rw=randwrite


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-07  0:20 [PATCH v2] locking/rwsem: Add reader-owned state to the owner field Waiman Long
  2016-05-07  4:56 ` Ingo Molnar
@ 2016-05-09  8:27 ` Peter Zijlstra
  2016-05-10  2:24   ` Waiman Long
  2016-05-09 18:44 ` Jason Low
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2016-05-09  8:27 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, Davidlohr Bueso, Jason Low,
	Dave Chinner, Scott J Norton, Douglas Hatch

On Fri, May 06, 2016 at 08:20:24PM -0400, Waiman Long wrote:
> @@ -391,9 +386,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>  		 * When there's no owner, we might have preempted between the
>  		 * owner acquiring the lock and setting the owner field. If
>  		 * we're an RT task that will live-lock because we won't let
> +		 * the owner complete. We also quit if the lock is owned by
> +		 * readers.

Maybe also note why we quit on readers.

>  		 */
> +		if (rwsem_is_reader_owned(owner) ||
> +		   (!owner && (need_resched() || rt_task(current))))
>  			break;
>  
>  		/*


> diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
> index 870ed9a..d7fea18 100644
> --- a/kernel/locking/rwsem.h
> +++ b/kernel/locking/rwsem.h
> @@ -1,3 +1,20 @@
> +/*
> + * The owner field of the rw_semaphore structure will be set to
> + * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
> + * the owner field when it unlocks. A reader, on the other hand, will
> + * not touch the owner field when it unlocks.
> + *
> + * In essence, the owner field now has the following 3 states:
> + *  1) 0
> + *     - lock is free or the owner hasn't set the field yet
> + *  2) RWSEM_READER_OWNED
> + *     - lock is currently or previously owned by readers (lock is free
> + *       or not set by owner yet)
> + *  3) Other non-zero value
> + *     - a writer owns the lock
> + */
> +#define RWSEM_READER_OWNED	1UL

#define RWSEM_READER_OWNED	((struct task_struct *)1UL)

> +
>  #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
>  static inline void rwsem_set_owner(struct rw_semaphore *sem)
>  {
> @@ -9,6 +26,26 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
>  	sem->owner = NULL;
>  }
>  
> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
> +{
> +	/*
> +	 * We check the owner value first to make sure that we will only
> +	 * do a write to the rwsem cacheline when it is really necessary
> +	 * to minimize cacheline contention.
> +	 */
> +	if (sem->owner != (struct task_struct *)RWSEM_READER_OWNED)
> +		sem->owner = (struct task_struct *)RWSEM_READER_OWNED;

How much if anything did this optimization matter?

> +}
> +
> +static inline bool rwsem_is_writer_owned(struct task_struct *owner)
> +{
> +	return (unsigned long)owner > RWSEM_READER_OWNED;
> +}

Tad too clever that; what does GCC generate if you write the obvious:

	return owner && owner != RWSEM_READER_OWNER;

> +
> +static inline bool rwsem_is_reader_owned(struct task_struct *owner)
> +{
> +	return owner == (struct task_struct *)RWSEM_READER_OWNED;
> +}

So I don't particularly like these names; they read like they take a
rwsem as argument, but they don't.

Would something like: rwsem_owner_is_{reader,writer}() make more sense?

>  #else
>  static inline void rwsem_set_owner(struct rw_semaphore *sem)
>  {

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-07  0:20 [PATCH v2] locking/rwsem: Add reader-owned state to the owner field Waiman Long
  2016-05-07  4:56 ` Ingo Molnar
  2016-05-09  8:27 ` Peter Zijlstra
@ 2016-05-09 18:44 ` Jason Low
  2016-05-10 13:03 ` Davidlohr Bueso
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Jason Low @ 2016-05-09 18:44 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Dave Chinner, Scott J Norton, Douglas Hatch, jason.low2

On Fri, 2016-05-06 at 20:20 -0400, Waiman Long wrote:
> Currently, it is not possible to determine for sure if a reader
> owns a rwsem by looking at the content of the rwsem data structure.
> This patch adds a new state RWSEM_READER_OWNED to the owner field
> to indicate that readers currently own the lock. This enables us to
> address the following 2 issues in the rwsem optimistic spinning code:
> 
>  1) rwsem_can_spin_on_owner() will disallow optimistic spinning if
>     the owner field is NULL which can mean either the readers own
>     the lock or the owning writer hasn't set the owner field yet.
>     In the latter case, we miss the chance to do optimistic spinning.
> 
>  2) While a writer is spinning and a reader takes the lock, the writer
>     will continue to spin in the main rwsem_optimistic_spin() loop as
>     the owner is NULL.
> 
> Adding the new state will allow optimistic spinning to go forward as
> long as the owner field is not RWSEM_READER_OWNED and the owner is
> running, if set, but stop immediately when that state has been reached.
> 
> On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
> fio test with multithreaded randrw and randwrite tests on the same
> file on a XFS partition on top of a NVDIMM were run, the aggregated
> bandwidths before and after the patch were as follows:
> 
>   Test      BW before patch     BW after patch  % change
>   ----      ---------------     --------------  --------
>   randrw         988 MB/s          1192 MB/s      +21%
>   randwrite     1513 MB/s          1623 MB/s      +7.3%
> 
> The perf profile of the rwsem_down_write_failed() function in randrw
> before and after the patch were:
> 
>    19.95%  5.88%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
>    14.20%  1.52%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
> 
> The actual CPU cycles spend in rwsem_down_write_failed() dropped from
> 5.88% to 1.52% after the patch.
> 
> The xfstests was also run and no regression was observed.
> 
> Signed-off-by: Waiman Long <Waiman.Long@hpe.com>

Acked-by: Jason Low <jason.low2@hp.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-09  8:27 ` Peter Zijlstra
@ 2016-05-10  2:24   ` Waiman Long
  2016-05-10  7:02     ` Peter Zijlstra
  0 siblings, 1 reply; 33+ messages in thread
From: Waiman Long @ 2016-05-10  2:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Davidlohr Bueso, Jason Low,
	Dave Chinner, Scott J Norton, Douglas Hatch

On 05/09/2016 04:27 AM, Peter Zijlstra wrote:
> On Fri, May 06, 2016 at 08:20:24PM -0400, Waiman Long wrote:
>> @@ -391,9 +386,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>>   		 * When there's no owner, we might have preempted between the
>>   		 * owner acquiring the lock and setting the owner field. If
>>   		 * we're an RT task that will live-lock because we won't let
>> +		 * the owner complete. We also quit if the lock is owned by
>> +		 * readers.
> Maybe also note why we quit on readers.

Sure. Will do so.

>>   		*/
>> +		if (rwsem_is_reader_owned(owner) ||
>> +		   (!owner&&  (need_resched() || rt_task(current))))
>>   			break;
>>
>>   		/*
>
>> diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
>> index 870ed9a..d7fea18 100644
>> --- a/kernel/locking/rwsem.h
>> +++ b/kernel/locking/rwsem.h
>> @@ -1,3 +1,20 @@
>> +/*
>> + * The owner field of the rw_semaphore structure will be set to
>> + * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
>> + * the owner field when it unlocks. A reader, on the other hand, will
>> + * not touch the owner field when it unlocks.
>> + *
>> + * In essence, the owner field now has the following 3 states:
>> + *  1) 0
>> + *     - lock is free or the owner hasn't set the field yet
>> + *  2) RWSEM_READER_OWNED
>> + *     - lock is currently or previously owned by readers (lock is free
>> + *       or not set by owner yet)
>> + *  3) Other non-zero value
>> + *     - a writer owns the lock
>> + */
>> +#define RWSEM_READER_OWNED	1UL
> #define RWSEM_READER_OWNED	((struct task_struct *)1UL)

Will make the change.

>> +
>>   #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
>>   static inline void rwsem_set_owner(struct rw_semaphore *sem)
>>   {
>> @@ -9,6 +26,26 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
>>   	sem->owner = NULL;
>>   }
>>
>> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
>> +{
>> +	/*
>> +	 * We check the owner value first to make sure that we will only
>> +	 * do a write to the rwsem cacheline when it is really necessary
>> +	 * to minimize cacheline contention.
>> +	 */
>> +	if (sem->owner != (struct task_struct *)RWSEM_READER_OWNED)
>> +		sem->owner = (struct task_struct *)RWSEM_READER_OWNED;
> How much if anything did this optimization matter?

I hadn't run any performance test to verify the effective of this 
change. For a reader-heavy rwsem, this change should be able to save 
quite a lot of needless write to the rwsem cacheline.

>> +}
>> +
>> +static inline bool rwsem_is_writer_owned(struct task_struct *owner)
>> +{
>> +	return (unsigned long)owner>  RWSEM_READER_OWNED;
>> +}
> Tad too clever that; what does GCC generate if you write the obvious:
>
> 	return owner&&  owner != RWSEM_READER_OWNER;

You are right. GCC is intelligent enough to make the necessary 
optimization. I will revert it to this form which is more obvious.

>> +
>> +static inline bool rwsem_is_reader_owned(struct task_struct *owner)
>> +{
>> +	return owner == (struct task_struct *)RWSEM_READER_OWNED;
>> +}
> So I don't particularly like these names; they read like they take a
> rwsem as argument, but they don't.
>
> Would something like: rwsem_owner_is_{reader,writer}() make more sense?

Yes, these names look good to me.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-10  2:24   ` Waiman Long
@ 2016-05-10  7:02     ` Peter Zijlstra
  0 siblings, 0 replies; 33+ messages in thread
From: Peter Zijlstra @ 2016-05-10  7:02 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, Davidlohr Bueso, Jason Low,
	Dave Chinner, Scott J Norton, Douglas Hatch

On Mon, May 09, 2016 at 10:24:16PM -0400, Waiman Long wrote:
> >>+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
> >>+{
> >>+	/*
> >>+	 * We check the owner value first to make sure that we will only
> >>+	 * do a write to the rwsem cacheline when it is really necessary
> >>+	 * to minimize cacheline contention.
> >>+	 */
> >>+	if (sem->owner != (struct task_struct *)RWSEM_READER_OWNED)
> >>+		sem->owner = (struct task_struct *)RWSEM_READER_OWNED;
> >How much if anything did this optimization matter?
> 
> I hadn't run any performance test to verify the effective of this change.
> For a reader-heavy rwsem, this change should be able to save quite a lot of
> needless write to the rwsem cacheline.

Right; I was just wondering.

> >>+static inline bool rwsem_is_writer_owned(struct task_struct *owner)
> >>+{
> >>+	return (unsigned long)owner>  RWSEM_READER_OWNED;
> >>+}
> >Tad too clever that; what does GCC generate if you write the obvious:
> >
> >	return owner&&  owner != RWSEM_READER_OWNER;
> 
> You are right. GCC is intelligent enough to make the necessary optimization.
> I will revert it to this form which is more obvious.

Yay! thanks for checking. Sometimes GCC throws a wobbly and does
something unexpected, but it tends to get these 'simple' things right.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-07  0:20 [PATCH v2] locking/rwsem: Add reader-owned state to the owner field Waiman Long
                   ` (2 preceding siblings ...)
  2016-05-09 18:44 ` Jason Low
@ 2016-05-10 13:03 ` Davidlohr Bueso
  2016-05-11 22:04 ` Peter Hurley
  2016-05-19  1:37 ` Dave Chinner
  5 siblings, 0 replies; 33+ messages in thread
From: Davidlohr Bueso @ 2016-05-10 13:03 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Jason Low,
	Dave Chinner, Scott J Norton, Douglas Hatch

On Fri, 06 May 2016, Waiman Long wrote:

>Currently, it is not possible to determine for sure if a reader
>owns a rwsem by looking at the content of the rwsem data structure.
>This patch adds a new state RWSEM_READER_OWNED to the owner field
>to indicate that readers currently own the lock. This enables us to
>address the following 2 issues in the rwsem optimistic spinning code:
>
> 1) rwsem_can_spin_on_owner() will disallow optimistic spinning if
>    the owner field is NULL which can mean either the readers own
>    the lock or the owning writer hasn't set the owner field yet.
>    In the latter case, we miss the chance to do optimistic spinning.
>
> 2) While a writer is spinning and a reader takes the lock, the writer
>    will continue to spin in the main rwsem_optimistic_spin() loop as
>    the owner is NULL.
>
>Adding the new state will allow optimistic spinning to go forward as
>long as the owner field is not RWSEM_READER_OWNED and the owner is
>running, if set, but stop immediately when that state has been reached.
>
>On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
>fio test with multithreaded randrw and randwrite tests on the same
>file on a XFS partition on top of a NVDIMM were run, the aggregated
>bandwidths before and after the patch were as follows:
>
>  Test      BW before patch     BW after patch  % change
>  ----      ---------------     --------------  --------
>  randrw         988 MB/s          1192 MB/s      +21%
>  randwrite     1513 MB/s          1623 MB/s      +7.3%
>
>The perf profile of the rwsem_down_write_failed() function in randrw
>before and after the patch were:
>
>   19.95%  5.88%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
>   14.20%  1.52%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
>
>The actual CPU cycles spend in rwsem_down_write_failed() dropped from
>5.88% to 1.52% after the patch.
>
>The xfstests was also run and no regression was observed.
>
>Signed-off-by: Waiman Long <Waiman.Long@hpe.com>

Acked-by: Davidlohr Bueso <dave@stgolabs.net>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-07  0:20 [PATCH v2] locking/rwsem: Add reader-owned state to the owner field Waiman Long
                   ` (3 preceding siblings ...)
  2016-05-10 13:03 ` Davidlohr Bueso
@ 2016-05-11 22:04 ` Peter Hurley
  2016-05-12 20:15   ` Waiman Long
  2016-05-13 15:07   ` Peter Zijlstra
  2016-05-19  1:37 ` Dave Chinner
  5 siblings, 2 replies; 33+ messages in thread
From: Peter Hurley @ 2016-05-11 22:04 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Davidlohr Bueso, Jason Low, Dave Chinner,
	Scott J Norton, Douglas Hatch

On 05/06/2016 05:20 PM, Waiman Long wrote:
> Currently, it is not possible to determine for sure if a reader
> owns a rwsem by looking at the content of the rwsem data structure.
> This patch adds a new state RWSEM_READER_OWNED to the owner field
> to indicate that readers currently own the lock. This enables us to
> address the following 2 issues in the rwsem optimistic spinning code:
> 
>  1) rwsem_can_spin_on_owner() will disallow optimistic spinning if
>     the owner field is NULL which can mean either the readers own
>     the lock or the owning writer hasn't set the owner field yet.
>     In the latter case, we miss the chance to do optimistic spinning.
> 
>  2) While a writer is spinning and a reader takes the lock, the writer
>     will continue to spin in the main rwsem_optimistic_spin() loop as
>     the owner is NULL.
> 
> Adding the new state will allow optimistic spinning to go forward as
> long as the owner field is not RWSEM_READER_OWNED and the owner is
> running, if set, but stop immediately when that state has been reached.

Really good idea.
Some comments below.


> On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
> fio test with multithreaded randrw and randwrite tests on the same
> file on a XFS partition on top of a NVDIMM were run, the aggregated
> bandwidths before and after the patch were as follows:
> 
>   Test      BW before patch     BW after patch  % change
>   ----      ---------------     --------------  --------
>   randrw         988 MB/s          1192 MB/s      +21%
>   randwrite     1513 MB/s          1623 MB/s      +7.3%
> 
> The perf profile of the rwsem_down_write_failed() function in randrw
> before and after the patch were:
> 
>    19.95%  5.88%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
>    14.20%  1.52%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
> 
> The actual CPU cycles spend in rwsem_down_write_failed() dropped from
> 5.88% to 1.52% after the patch.
> 
> The xfstests was also run and no regression was observed.
> 
> Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
> ---
>  v1->v2:
>   - Add rwsem_is_reader_owned() helper & rename rwsem_reader_owned()
>     to rwsem_set_reader_owned().
>   - Add more comments to clarify the purpose of some of the code
>     changes.
> 
>  kernel/locking/rwsem-xadd.c |   39 ++++++++++++++++++---------------------
>  kernel/locking/rwsem.c      |    8 ++++++--
>  kernel/locking/rwsem.h      |   41 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 65 insertions(+), 23 deletions(-)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index df4dcb8..620a286 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -155,6 +155,12 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
>  			/* Last active locker left. Retry waking readers. */
>  			goto try_reader_grant;
>  		}
> +		/*
> +		 * It is not really necessary to set it to reader-owned here,
> +		 * but it gives the spinners an early indication that the
> +		 * readers now have the lock.
> +		 */
> +		rwsem_set_reader_owned(sem);
>  	}
>  
>  	/* Grant an infinite number of read locks to the readers at the front
> @@ -306,16 +312,11 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
>  
>  	rcu_read_lock();
>  	owner = READ_ONCE(sem->owner);
> -	if (!owner) {
> -		long count = READ_ONCE(sem->count);
> +	if (!rwsem_is_writer_owned(owner)) {
>  		/*
> -		 * If sem->owner is not set, yet we have just recently entered the
> -		 * slowpath with the lock being active, then there is a possibility
> -		 * reader(s) may have the lock. To be safe, bail spinning in these
> -		 * situations.
> +		 * Don't spin if the rwsem is readers owned.
>  		 */
> -		if (count & RWSEM_ACTIVE_MASK)
> -			ret = false;
> +		ret = !rwsem_is_reader_owned(owner);
>  		goto done;
>  	}

I'm not a big fan of all the helpers; istm like they're obfuscating the more
important requirements of rwsem. For example, this reduces to

	rcu_read_lock();
	owner = READ_ONCE(sem->owner);
	ret = !owner || (rwsem_is_writer_owned(owner) && owner->on_cpu);
	rcu_read_unlock();
	return ret;

  
> @@ -328,8 +329,6 @@ done:
>  static noinline
>  bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
>  {
> -	long count;
> -
>  	rcu_read_lock();
>  	while (sem->owner == owner) {
>  		/*
> @@ -350,16 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
>  	}
>  	rcu_read_unlock();
>  
> -	if (READ_ONCE(sem->owner))
> -		return true; /* new owner, continue spinning */
> -
>  	/*
> -	 * When the owner is not set, the lock could be free or
> -	 * held by readers. Check the counter to verify the
> -	 * state.
> +	 * If there is a new owner or the owner is not set, we continue
> +	 * spinning.
>  	 */
> -	count = READ_ONCE(sem->count);
> -	return (count == 0 || count == RWSEM_WAITING_BIAS);
> +	return !rwsem_is_reader_owned(READ_ONCE(sem->owner));

It doesn't make sense to force reload sem->owner here; if sem->owner
is not being reloaded then the loop above will execute forever.

Arguably, this check should be bumped out to the optimistic spin and
reload/check the owner there?

Or better yet; don't pass the owner in as a parameter at all, but
instead snapshot the owner and check its ownership on entry.

Because see below...

>  }
>  
>  static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
> @@ -378,7 +372,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>  
>  	while (true) {
>  		owner = READ_ONCE(sem->owner);
> -		if (owner && !rwsem_spin_on_owner(sem, owner))
> +		if (rwsem_is_writer_owned(owner) &&
> +		   !rwsem_spin_on_owner(sem, owner))
>  			break;
>  
>  		/* wait_lock will be acquired if write_lock is obtained */
> @@ -391,9 +386,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>  		 * When there's no owner, we might have preempted between the
>  		 * owner acquiring the lock and setting the owner field. If
>  		 * we're an RT task that will live-lock because we won't let
> -		 * the owner complete.
> +		 * the owner complete. We also quit if the lock is owned by
> +		 * readers.
>  		 */
> -		if (!owner && (need_resched() || rt_task(current)))
> +		if (rwsem_is_reader_owned(owner) ||
> +		   (!owner && (need_resched() || rt_task(current))))

This is using the stale owner value that was cached before spinning on the owner;
That can't be right.

Regards,
Peter Hurley

>  			break;
>  
>  		/*
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index c817216..5838f56 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -22,6 +22,7 @@ void __sched down_read(struct rw_semaphore *sem)
>  	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
>  
>  	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
> +	rwsem_set_reader_owned(sem);
>  }
>  
>  EXPORT_SYMBOL(down_read);
> @@ -33,8 +34,10 @@ int down_read_trylock(struct rw_semaphore *sem)
>  {
>  	int ret = __down_read_trylock(sem);
>  
> -	if (ret == 1)
> +	if (ret == 1) {
>  		rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_);
> +		rwsem_set_reader_owned(sem);
> +	}
>  	return ret;
>  }
>  
> @@ -124,7 +127,7 @@ void downgrade_write(struct rw_semaphore *sem)
>  	 * lockdep: a downgraded write will live on as a write
>  	 * dependency.
>  	 */
> -	rwsem_clear_owner(sem);
> +	rwsem_set_reader_owned(sem);
>  	__downgrade_write(sem);
>  }
>  
> @@ -138,6 +141,7 @@ void down_read_nested(struct rw_semaphore *sem, int subclass)
>  	rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_);
>  
>  	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
> +	rwsem_set_reader_owned(sem);
>  }
>  
>  EXPORT_SYMBOL(down_read_nested);
> diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
> index 870ed9a..d7fea18 100644
> --- a/kernel/locking/rwsem.h
> +++ b/kernel/locking/rwsem.h
> @@ -1,3 +1,20 @@
> +/*
> + * The owner field of the rw_semaphore structure will be set to
> + * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
> + * the owner field when it unlocks. A reader, on the other hand, will
> + * not touch the owner field when it unlocks.
> + *
> + * In essence, the owner field now has the following 3 states:
> + *  1) 0
> + *     - lock is free or the owner hasn't set the field yet
> + *  2) RWSEM_READER_OWNED
> + *     - lock is currently or previously owned by readers (lock is free
> + *       or not set by owner yet)
> + *  3) Other non-zero value
> + *     - a writer owns the lock
> + */
> +#define RWSEM_READER_OWNED	1UL
> +
>  #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
>  static inline void rwsem_set_owner(struct rw_semaphore *sem)
>  {
> @@ -9,6 +26,26 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
>  	sem->owner = NULL;
>  }
>  
> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
> +{
> +	/*
> +	 * We check the owner value first to make sure that we will only
> +	 * do a write to the rwsem cacheline when it is really necessary
> +	 * to minimize cacheline contention.
> +	 */
> +	if (sem->owner != (struct task_struct *)RWSEM_READER_OWNED)
> +		sem->owner = (struct task_struct *)RWSEM_READER_OWNED;
> +}
> +
> +static inline bool rwsem_is_writer_owned(struct task_struct *owner)
> +{
> +	return (unsigned long)owner > RWSEM_READER_OWNED;
> +}
> +
> +static inline bool rwsem_is_reader_owned(struct task_struct *owner)
> +{
> +	return owner == (struct task_struct *)RWSEM_READER_OWNED;
> +}
>  #else
>  static inline void rwsem_set_owner(struct rw_semaphore *sem)
>  {
> @@ -17,4 +54,8 @@ static inline void rwsem_set_owner(struct rw_semaphore *sem)
>  static inline void rwsem_clear_owner(struct rw_semaphore *sem)
>  {
>  }
> +
> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
> +{
> +}
>  #endif
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-11 22:04 ` Peter Hurley
@ 2016-05-12 20:15   ` Waiman Long
  2016-05-12 21:27     ` Peter Hurley
  2016-05-13 15:07   ` Peter Zijlstra
  1 sibling, 1 reply; 33+ messages in thread
From: Waiman Long @ 2016-05-12 20:15 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Dave Chinner, Scott J Norton, Douglas Hatch

On 05/11/2016 06:04 PM, Peter Hurley wrote:
>   	/* Grant an infinite number of read locks to the readers at the front
> @@ -306,16 +312,11 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
>
>   	rcu_read_lock();
>   	owner = READ_ONCE(sem->owner);
> -	if (!owner) {
> -		long count = READ_ONCE(sem->count);
> +	if (!rwsem_is_writer_owned(owner)) {
>   		/*
> -		 * If sem->owner is not set, yet we have just recently entered the
> -		 * slowpath with the lock being active, then there is a possibility
> -		 * reader(s) may have the lock. To be safe, bail spinning in these
> -		 * situations.
> +		 * Don't spin if the rwsem is readers owned.
>   		 */
> -		if (count&  RWSEM_ACTIVE_MASK)
> -			ret = false;
> +		ret = !rwsem_is_reader_owned(owner);
>   		goto done;
>   	}
> I'm not a big fan of all the helpers; istm like they're obfuscating the more
> important requirements of rwsem. For example, this reduces to
>
> 	rcu_read_lock();
> 	owner = READ_ONCE(sem->owner);
> 	ret = !owner || (rwsem_is_writer_owned(owner)&&  owner->on_cpu);
> 	rcu_read_unlock();
> 	return ret;
>

Using helper functions usually make the code easier to read. This is 
helpful for the rwsem code which can be hard to understand especially 
how the different count values interact.


>
>> @@ -328,8 +329,6 @@ done:
>>   static noinline
>>   bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
>>   {
>> -	long count;
>> -
>>   	rcu_read_lock();
>>   	while (sem->owner == owner) {
>>   		/*
>> @@ -350,16 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
>>   	}
>>   	rcu_read_unlock();
>>
>> -	if (READ_ONCE(sem->owner))
>> -		return true; /* new owner, continue spinning */
>> -
>>   	/*
>> -	 * When the owner is not set, the lock could be free or
>> -	 * held by readers. Check the counter to verify the
>> -	 * state.
>> +	 * If there is a new owner or the owner is not set, we continue
>> +	 * spinning.
>>   	 */
>> -	count = READ_ONCE(sem->count);
>> -	return (count == 0 || count == RWSEM_WAITING_BIAS);
>> +	return !rwsem_is_reader_owned(READ_ONCE(sem->owner));
> It doesn't make sense to force reload sem->owner here; if sem->owner
> is not being reloaded then the loop above will execute forever.

I agree that we don't actually need to use READ_ONCE() here for 
sem->owner as the barrier() call will force the reloading. It is more 
like a habit to use it for public variable or we will have to think a 
bit harder to make sure that we are doing the right thing.

> Arguably, this check should be bumped out to the optimistic spin and
> reload/check the owner there?
>
> Or better yet; don't pass the owner in as a parameter at all, but
> instead snapshot the owner and check its ownership on entry.

That will make the main optimistic spinning loop more complex.

>
> Because see below...
>
>>   }
>>
>>   static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>> @@ -378,7 +372,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>>
>>   	while (true) {
>>   		owner = READ_ONCE(sem->owner);
>> -		if (owner&&  !rwsem_spin_on_owner(sem, owner))
>> +		if (rwsem_is_writer_owned(owner)&&
>> +		   !rwsem_spin_on_owner(sem, owner))
>>   			break;
>>
>>   		/* wait_lock will be acquired if write_lock is obtained */
>> @@ -391,9 +386,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>>   		 * When there's no owner, we might have preempted between the
>>   		 * owner acquiring the lock and setting the owner field. If
>>   		 * we're an RT task that will live-lock because we won't let
>> -		 * the owner complete.
>> +		 * the owner complete. We also quit if the lock is owned by
>> +		 * readers.
>>   		 */
>> -		if (!owner&&  (need_resched() || rt_task(current)))
>> +		if (rwsem_is_reader_owned(owner) ||
>> +		   (!owner&&  (need_resched() || rt_task(current))))
> This is using the stale owner value that was cached before spinning on the owner;
> That can't be right.

The code is going to loop back and reload the new owner value anyway. It 
is just a bit of additional latency. I will move the is_reader check up 
after loading sem->owner to clear any confusion.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-12 20:15   ` Waiman Long
@ 2016-05-12 21:27     ` Peter Hurley
  2016-05-12 23:13       ` Waiman Long
  0 siblings, 1 reply; 33+ messages in thread
From: Peter Hurley @ 2016-05-12 21:27 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Dave Chinner, Scott J Norton, Douglas Hatch

On 05/12/2016 01:15 PM, Waiman Long wrote:
> On 05/11/2016 06:04 PM, Peter Hurley wrote:

[...]

>>
>>> @@ -328,8 +329,6 @@ done:
>>>   static noinline
>>>   bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
>>>   {
>>> -    long count;
>>> -
>>>       rcu_read_lock();
>>>       while (sem->owner == owner) {
>>>           /*
>>> @@ -350,16 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
>>>       }
>>>       rcu_read_unlock();
>>>
>>> -    if (READ_ONCE(sem->owner))
>>> -        return true; /* new owner, continue spinning */
>>> -
>>>       /*
>>> -     * When the owner is not set, the lock could be free or
>>> -     * held by readers. Check the counter to verify the
>>> -     * state.
>>> +     * If there is a new owner or the owner is not set, we continue
>>> +     * spinning.
>>>        */
>>> -    count = READ_ONCE(sem->count);
>>> -    return (count == 0 || count == RWSEM_WAITING_BIAS);
>>> +    return !rwsem_is_reader_owned(READ_ONCE(sem->owner));
>> It doesn't make sense to force reload sem->owner here; if sem->owner
>> is not being reloaded then the loop above will execute forever.
> 
> I agree that we don't actually need to use READ_ONCE() here for sem->owner as the barrier() call will force the reloading. It is more like a habit to use it for public variable or we will have to think a bit harder to make sure that we are doing the right thing.

I look at that code and I think "what am I missing that it needs to reload at exit"
Extra cruft is just as misleading.

What matters while spinning on owner is that the reload is forced as part of the
loop, not at exit time. The whole point of rwsem_spin_on_owner() is to spin
until the owner _changes_; there's no need to get a more recent owner value
than the one that caused the loop to break.


>> Arguably, this check should be bumped out to the optimistic spin and
>> reload/check the owner there?
>>
>> Or better yet; don't pass the owner in as a parameter at all, but
>> instead snapshot the owner and check its ownership on entry.
> 
> That will make the main optimistic spinning loop more complex.

??

Simpler.

	while (rwsem_spin_on_owner(sem)) {
		if (rwsem_try_write_lock_unqueued(sem)) {
			taken = true;
			break;
		}

		if (!sem->owner && (need_resched() || rt_task(current)))
			break;

		cpu_relax_lowlatency();
	}




bool rwsem_spin_on_owner(struct rw_semaphore *sem)
{
	struct task_struct *owner = READ_ONCE(sem->owner);

	if (!rwsem_is_writer_owned(owner))
		return false;

	rcu_read_lock();
	while (sem->owner == owner) {
		....
	}
	rcu_read_unlock();
	
	return !rwsem_is_reader_owned(sem->owner);
}


>>
>> Because see below...
>>
>>>   }
>>>
>>>   static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>>> @@ -378,7 +372,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>>>
>>>       while (true) {
>>>           owner = READ_ONCE(sem->owner);
>>> -        if (owner&&  !rwsem_spin_on_owner(sem, owner))
>>> +        if (rwsem_is_writer_owned(owner)&&
>>> +           !rwsem_spin_on_owner(sem, owner))
>>>               break;
>>>
>>>           /* wait_lock will be acquired if write_lock is obtained */
>>> @@ -391,9 +386,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>>>            * When there's no owner, we might have preempted between the
>>>            * owner acquiring the lock and setting the owner field. If
>>>            * we're an RT task that will live-lock because we won't let
>>> -         * the owner complete.
>>> +         * the owner complete. We also quit if the lock is owned by
>>> +         * readers.
>>>            */
>>> -        if (!owner&&  (need_resched() || rt_task(current)))
>>> +        if (rwsem_is_reader_owned(owner) ||
>>> +           (!owner&&  (need_resched() || rt_task(current))))
>> This is using the stale owner value that was cached before spinning on the owner;
>> That can't be right.
> 
> The code is going to loop back and reload the new owner value anyway. It is just a bit of additional latency. I will move the is_reader check up after loading sem->owner to clear any confusion.

Well, why do it at all then?  Just before attempting the lock, rwsem_spin_on_owner()
determined that the owner was not a reader. If a reader became the new owner after
that, the loop will discover that the lock is no longer writer owned anyway.

Either way, the check here should be to sem->owner (ie., like I wrote above),
but there's no need to force reload here either.

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-12 21:27     ` Peter Hurley
@ 2016-05-12 23:13       ` Waiman Long
  0 siblings, 0 replies; 33+ messages in thread
From: Waiman Long @ 2016-05-12 23:13 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Dave Chinner, Scott J Norton, Douglas Hatch

On 05/12/2016 05:27 PM, Peter Hurley wrote:
>>> Arguably, this check should be bumped out to the optimistic spin and
>>> reload/check the owner there?
>>>
>>> Or better yet; don't pass the owner in as a parameter at all, but
>>> instead snapshot the owner and check its ownership on entry.
>> That will make the main optimistic spinning loop more complex.
> ??
>
> Simpler.
>
> 	while (rwsem_spin_on_owner(sem)) {
> 		if (rwsem_try_write_lock_unqueued(sem)) {
> 			taken = true;
> 			break;
> 		}
>
> 		if (!sem->owner&&  (need_resched() || rt_task(current)))
> 			break;
>
> 		cpu_relax_lowlatency();
> 	}
>
>
>
>
> bool rwsem_spin_on_owner(struct rw_semaphore *sem)
> {
> 	struct task_struct *owner = READ_ONCE(sem->owner);
>
> 	if (!rwsem_is_writer_owned(owner))
> 		return false;
>
> 	rcu_read_lock();
> 	while (sem->owner == owner) {
> 		....
> 	}
> 	rcu_read_unlock();
> 	
> 	return !rwsem_is_reader_owned(sem->owner);
> }
>

I have been thinking about something like that, but my inclination is to 
make as few changes as possible to the existing patch. I did add a new 
patch to streamline the code as suggested.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-11 22:04 ` Peter Hurley
  2016-05-12 20:15   ` Waiman Long
@ 2016-05-13 15:07   ` Peter Zijlstra
  2016-05-13 17:58     ` Peter Hurley
  1 sibling, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2016-05-13 15:07 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Waiman Long, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Dave Chinner, Scott J Norton, Douglas Hatch

On Wed, May 11, 2016 at 03:04:20PM -0700, Peter Hurley wrote:
> > +	return !rwsem_is_reader_owned(READ_ONCE(sem->owner));
> 
> It doesn't make sense to force reload sem->owner here; if sem->owner
> is not being reloaded then the loop above will execute forever.
> 
> Arguably, this check should be bumped out to the optimistic spin and
> reload/check the owner there?
> 

Note that barrier() and READ_ONCE() have overlapping but not identical
results and the combined use actually makes sense here.

Yes, a barrier() anywhere in the loop will force a reload of the
variable, _however_ it doesn't force that reload to not suffer from
load tearing.

Using volatile also forces a reload, but also ensures the load cannot
be torn IFF it is of machine word side and naturally aligned.

So while the READ_ONCE() here is pointless for forcing the reload;
that's already ensured, we still need to make sure the load isn't torn.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-13 15:07   ` Peter Zijlstra
@ 2016-05-13 17:58     ` Peter Hurley
  2016-05-15 14:47       ` Waiman Long
  2016-05-16 11:09       ` Peter Zijlstra
  0 siblings, 2 replies; 33+ messages in thread
From: Peter Hurley @ 2016-05-13 17:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Dave Chinner, Scott J Norton, Douglas Hatch

On 05/13/2016 08:07 AM, Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 03:04:20PM -0700, Peter Hurley wrote:
>>> +	return !rwsem_is_reader_owned(READ_ONCE(sem->owner));
>>
>> It doesn't make sense to force reload sem->owner here; if sem->owner
>> is not being reloaded then the loop above will execute forever.
>>
>> Arguably, this check should be bumped out to the optimistic spin and
>> reload/check the owner there?
>>
> 
> Note that barrier() and READ_ONCE() have overlapping but not identical
> results and the combined use actually makes sense here.
> 
> Yes, a barrier() anywhere in the loop will force a reload of the
> variable, _however_ it doesn't force that reload to not suffer from
> load tearing.
> 
> Using volatile also forces a reload, but also ensures the load cannot
> be torn IFF it is of machine word side and naturally aligned.
> 
> So while the READ_ONCE() here is pointless for forcing the reload;
> that's already ensured, we still need to make sure the load isn't torn.

If load tearing a naturally aligned pointer is a real code generation
possibility then the rcu list code is broken too (which loads ->next
directly; cf. list_for_each_entry_rcu() & list_for_each_entry_lockless()).

For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
that had to do with control dependencies and not load tearing.

OTOH, this patch might actually produce store-tearing:

+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+	/*
+	 * We check the owner value first to make sure that we will only
+	 * do a write to the rwsem cacheline when it is really necessary
+	 * to minimize cacheline contention.
+	 */
+	if (sem->owner != RWSEM_READER_OWNED)
+		sem->owner = RWSEM_READER_OWNED;
+}


Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-13 17:58     ` Peter Hurley
@ 2016-05-15 14:47       ` Waiman Long
  2016-05-16 11:09       ` Peter Zijlstra
  1 sibling, 0 replies; 33+ messages in thread
From: Waiman Long @ 2016-05-15 14:47 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Dave Chinner, Scott J Norton, Douglas Hatch

On 05/13/2016 01:58 PM, Peter Hurley wrote:
> On 05/13/2016 08:07 AM, Peter Zijlstra wrote:
>> On Wed, May 11, 2016 at 03:04:20PM -0700, Peter Hurley wrote:
>>>> +	return !rwsem_is_reader_owned(READ_ONCE(sem->owner));
>>> It doesn't make sense to force reload sem->owner here; if sem->owner
>>> is not being reloaded then the loop above will execute forever.
>>>
>>> Arguably, this check should be bumped out to the optimistic spin and
>>> reload/check the owner there?
>>>
>> Note that barrier() and READ_ONCE() have overlapping but not identical
>> results and the combined use actually makes sense here.
>>
>> Yes, a barrier() anywhere in the loop will force a reload of the
>> variable, _however_ it doesn't force that reload to not suffer from
>> load tearing.
>>
>> Using volatile also forces a reload, but also ensures the load cannot
>> be torn IFF it is of machine word side and naturally aligned.
>>
>> So while the READ_ONCE() here is pointless for forcing the reload;
>> that's already ensured, we still need to make sure the load isn't torn.
> If load tearing a naturally aligned pointer is a real code generation
> possibility then the rcu list code is broken too (which loads ->next
> directly; cf. list_for_each_entry_rcu()&  list_for_each_entry_lockless()).
>
> For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
> that had to do with control dependencies and not load tearing.
>
> OTOH, this patch might actually produce store-tearing:
>
> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
> +{
> +	/*
> +	 * We check the owner value first to make sure that we will only
> +	 * do a write to the rwsem cacheline when it is really necessary
> +	 * to minimize cacheline contention.
> +	 */
> +	if (sem->owner != RWSEM_READER_OWNED)
> +		sem->owner = RWSEM_READER_OWNED;
> +}
>
>
> Regards,
> Peter Hurley

While load tearing in the argument to rwsem_is_reader_owned() isn't an 
issue as the wrong decision won't do any harm. Store tearing as 
identified above can be a problem. I will fix that even though the the 
chance of compiling generating store tearing code is really small.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-13 17:58     ` Peter Hurley
  2016-05-15 14:47       ` Waiman Long
@ 2016-05-16 11:09       ` Peter Zijlstra
  2016-05-16 12:17         ` Paul E. McKenney
  1 sibling, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2016-05-16 11:09 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Waiman Long, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Dave Chinner, Scott J Norton, Douglas Hatch,
	Paul McKenney

On Fri, May 13, 2016 at 10:58:05AM -0700, Peter Hurley wrote:
> > Note that barrier() and READ_ONCE() have overlapping but not identical
> > results and the combined use actually makes sense here.
> > 
> > Yes, a barrier() anywhere in the loop will force a reload of the
> > variable, _however_ it doesn't force that reload to not suffer from
> > load tearing.
> > 
> > Using volatile also forces a reload, but also ensures the load cannot
> > be torn IFF it is of machine word side and naturally aligned.
> > 
> > So while the READ_ONCE() here is pointless for forcing the reload;
> > that's already ensured, we still need to make sure the load isn't torn.
> 
> If load tearing a naturally aligned pointer is a real code generation
> possibility then the rcu list code is broken too (which loads ->next
> directly; cf. list_for_each_entry_rcu() & list_for_each_entry_lockless()).
> 
> For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
> that had to do with control dependencies and not load tearing.

Well, Paul is the one who started the whole load/store tearing thing, so
I suppose he knows what he's doing.

That said; its a fairly recent as things go so lots of code hasn't been
updated yet, and its also a very unlikely thing for a compiler to do;
since it mostly doesn't make sense to emit multiple instructions where
one will do, so its not a very high priority thing either.

But from what I understand, the compiler is free to emit all kinds of
nonsense for !volatile loads/stores.

> OTOH, this patch might actually produce store-tearing:
> 
> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
> +{
> +	/*
> +	 * We check the owner value first to make sure that we will only
> +	 * do a write to the rwsem cacheline when it is really necessary
> +	 * to minimize cacheline contention.
> +	 */
> +	if (sem->owner != RWSEM_READER_OWNED)
> +		sem->owner = RWSEM_READER_OWNED;
> +}

Correct; which is why we should always use {READ,WRITE}_ONCE() for
anything that is used locklessly.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-16 11:09       ` Peter Zijlstra
@ 2016-05-16 12:17         ` Paul E. McKenney
  2016-05-16 14:17           ` Peter Hurley
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-05-16 12:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Peter Hurley, Waiman Long, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Jason Low, Dave Chinner, Scott J Norton,
	Douglas Hatch, kcc, dvyukov

On Mon, May 16, 2016 at 01:09:48PM +0200, Peter Zijlstra wrote:
> On Fri, May 13, 2016 at 10:58:05AM -0700, Peter Hurley wrote:
> > > Note that barrier() and READ_ONCE() have overlapping but not identical
> > > results and the combined use actually makes sense here.
> > > 
> > > Yes, a barrier() anywhere in the loop will force a reload of the
> > > variable, _however_ it doesn't force that reload to not suffer from
> > > load tearing.
> > > 
> > > Using volatile also forces a reload, but also ensures the load cannot
> > > be torn IFF it is of machine word side and naturally aligned.
> > > 
> > > So while the READ_ONCE() here is pointless for forcing the reload;
> > > that's already ensured, we still need to make sure the load isn't torn.
> > 
> > If load tearing a naturally aligned pointer is a real code generation
> > possibility then the rcu list code is broken too (which loads ->next
> > directly; cf. list_for_each_entry_rcu() & list_for_each_entry_lockless()).
> > 
> > For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
> > that had to do with control dependencies and not load tearing.
> 
> Well, Paul is the one who started the whole load/store tearing thing, so
> I suppose he knows what he's doing.

That had to do with suppressing false positives for one of Dmitry
Vjukov's concurrency checkers.  I suspect that Peter Hurley is right
that continued use of that checker would identify other places needing
READ_ONCE(), but from what I understand that is on hold pending a formal
definition of the Linux-kernel memory model.  (KCC and Dmitry (CCed)
can correct my if I am confused on this point.)

> That said; its a fairly recent as things go so lots of code hasn't been
> updated yet, and its also a very unlikely thing for a compiler to do;
> since it mostly doesn't make sense to emit multiple instructions where
> one will do, so its not a very high priority thing either.
> 
> But from what I understand, the compiler is free to emit all kinds of
> nonsense for !volatile loads/stores.

That is quite true.  :-/

> > OTOH, this patch might actually produce store-tearing:
> > 
> > +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
> > +{
> > +	/*
> > +	 * We check the owner value first to make sure that we will only
> > +	 * do a write to the rwsem cacheline when it is really necessary
> > +	 * to minimize cacheline contention.
> > +	 */
> > +	if (sem->owner != RWSEM_READER_OWNED)
> > +		sem->owner = RWSEM_READER_OWNED;
> > +}
> 
> Correct; which is why we should always use {READ,WRITE}_ONCE() for
> anything that is used locklessly.

Completely agreed.  Improve readability of code by flagging lockless
shared-memory accesses, help checkers better find bugs, and prevent the
occasional compiler mischief!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-16 12:17         ` Paul E. McKenney
@ 2016-05-16 14:17           ` Peter Hurley
  2016-05-16 17:22             ` Paul E. McKenney
  2016-05-16 17:50             ` Peter Zijlstra
  0 siblings, 2 replies; 33+ messages in thread
From: Peter Hurley @ 2016-05-16 14:17 UTC (permalink / raw)
  To: paulmck, Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Dave Chinner, Scott J Norton, Douglas Hatch, kcc,
	dvyukov

On 05/16/2016 05:17 AM, Paul E. McKenney wrote:
> On Mon, May 16, 2016 at 01:09:48PM +0200, Peter Zijlstra wrote:
>> On Fri, May 13, 2016 at 10:58:05AM -0700, Peter Hurley wrote:
>>>> Note that barrier() and READ_ONCE() have overlapping but not identical
>>>> results and the combined use actually makes sense here.
>>>>
>>>> Yes, a barrier() anywhere in the loop will force a reload of the
>>>> variable, _however_ it doesn't force that reload to not suffer from
>>>> load tearing.
>>>>
>>>> Using volatile also forces a reload, but also ensures the load cannot
>>>> be torn IFF it is of machine word side and naturally aligned.
>>>>
>>>> So while the READ_ONCE() here is pointless for forcing the reload;
>>>> that's already ensured, we still need to make sure the load isn't torn.
>>>
>>> If load tearing a naturally aligned pointer is a real code generation
>>> possibility then the rcu list code is broken too (which loads ->next
>>> directly; cf. list_for_each_entry_rcu() & list_for_each_entry_lockless()).
>>>
>>> For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
>>> that had to do with control dependencies and not load tearing.
>>
>> Well, Paul is the one who started the whole load/store tearing thing, so
>> I suppose he knows what he's doing.
> 
> That had to do with suppressing false positives for one of Dmitry
> Vjukov's concurrency checkers.  I suspect that Peter Hurley is right
> that continued use of that checker would identify other places needing
> READ_ONCE(), but from what I understand that is on hold pending a formal
> definition of the Linux-kernel memory model.  (KCC and Dmitry (CCed)
> can correct my if I am confused on this point.)
> 
>> That said; its a fairly recent as things go so lots of code hasn't been
>> updated yet, and its also a very unlikely thing for a compiler to do;
>> since it mostly doesn't make sense to emit multiple instructions where
>> one will do, so its not a very high priority thing either.
>>
>> But from what I understand, the compiler is free to emit all kinds of
>> nonsense for !volatile loads/stores.
> 
> That is quite true.  :-/
> 
>>> OTOH, this patch might actually produce store-tearing:
>>>
>>> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
>>> +{
>>> +	/*
>>> +	 * We check the owner value first to make sure that we will only
>>> +	 * do a write to the rwsem cacheline when it is really necessary
>>> +	 * to minimize cacheline contention.
>>> +	 */
>>> +	if (sem->owner != RWSEM_READER_OWNED)
>>> +		sem->owner = RWSEM_READER_OWNED;
>>> +}
>>
>> Correct; which is why we should always use {READ,WRITE}_ONCE() for
>> anything that is used locklessly.
> 
> Completely agreed.  Improve readability of code by flagging lockless
> shared-memory accesses, help checkers better find bugs, and prevent the
> occasional compiler mischief!

I think this would be a mistake for 3 reasons:

1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
   of any normally-atomic type (char/int/long/void*), then _every_ access
   would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
   of compiler optimization (eg. eliding redundant loads) where it would
   otherwise be possible.

2. Makes a mess of otherwise readable code.

3. Error-prone; ie., easy to overlook in review.

There is no practical difference between _always_ using READ_ONCE()/WRITE_ONCE()
(to prevent tearing) and declaring the field volatile.

So we've come full-circle from volatile-considered-harmful.

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-16 14:17           ` Peter Hurley
@ 2016-05-16 17:22             ` Paul E. McKenney
  2016-05-17 19:46               ` Peter Hurley
  2016-05-16 17:50             ` Peter Zijlstra
  1 sibling, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-05-16 17:22 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Peter Zijlstra, Waiman Long, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Jason Low, Dave Chinner, Scott J Norton,
	Douglas Hatch, kcc, dvyukov, dhowells

On Mon, May 16, 2016 at 07:17:42AM -0700, Peter Hurley wrote:
> On 05/16/2016 05:17 AM, Paul E. McKenney wrote:
> > On Mon, May 16, 2016 at 01:09:48PM +0200, Peter Zijlstra wrote:
> >> On Fri, May 13, 2016 at 10:58:05AM -0700, Peter Hurley wrote:
> >>>> Note that barrier() and READ_ONCE() have overlapping but not identical
> >>>> results and the combined use actually makes sense here.
> >>>>
> >>>> Yes, a barrier() anywhere in the loop will force a reload of the
> >>>> variable, _however_ it doesn't force that reload to not suffer from
> >>>> load tearing.
> >>>>
> >>>> Using volatile also forces a reload, but also ensures the load cannot
> >>>> be torn IFF it is of machine word side and naturally aligned.
> >>>>
> >>>> So while the READ_ONCE() here is pointless for forcing the reload;
> >>>> that's already ensured, we still need to make sure the load isn't torn.
> >>>
> >>> If load tearing a naturally aligned pointer is a real code generation
> >>> possibility then the rcu list code is broken too (which loads ->next
> >>> directly; cf. list_for_each_entry_rcu() & list_for_each_entry_lockless()).
> >>>
> >>> For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
> >>> that had to do with control dependencies and not load tearing.
> >>
> >> Well, Paul is the one who started the whole load/store tearing thing, so
> >> I suppose he knows what he's doing.
> > 
> > That had to do with suppressing false positives for one of Dmitry
> > Vjukov's concurrency checkers.  I suspect that Peter Hurley is right
> > that continued use of that checker would identify other places needing
> > READ_ONCE(), but from what I understand that is on hold pending a formal
> > definition of the Linux-kernel memory model.  (KCC and Dmitry (CCed)
> > can correct my if I am confused on this point.)
> > 
> >> That said; its a fairly recent as things go so lots of code hasn't been
> >> updated yet, and its also a very unlikely thing for a compiler to do;
> >> since it mostly doesn't make sense to emit multiple instructions where
> >> one will do, so its not a very high priority thing either.
> >>
> >> But from what I understand, the compiler is free to emit all kinds of
> >> nonsense for !volatile loads/stores.
> > 
> > That is quite true.  :-/
> > 
> >>> OTOH, this patch might actually produce store-tearing:
> >>>
> >>> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
> >>> +{
> >>> +	/*
> >>> +	 * We check the owner value first to make sure that we will only
> >>> +	 * do a write to the rwsem cacheline when it is really necessary
> >>> +	 * to minimize cacheline contention.
> >>> +	 */
> >>> +	if (sem->owner != RWSEM_READER_OWNED)
> >>> +		sem->owner = RWSEM_READER_OWNED;
> >>> +}
> >>
> >> Correct; which is why we should always use {READ,WRITE}_ONCE() for
> >> anything that is used locklessly.
> > 
> > Completely agreed.  Improve readability of code by flagging lockless
> > shared-memory accesses, help checkers better find bugs, and prevent the
> > occasional compiler mischief!
> 
> I think this would be a mistake for 3 reasons:
> 
> 1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
>    of any normally-atomic type (char/int/long/void*), then _every_ access
>    would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
>    of compiler optimization (eg. eliding redundant loads) where it would
>    otherwise be possible.

The point about eliding redundant loads is a good one, at least in those
cases where it is a reasonable optimization.  Should we ever get to a
point where we no longer use pre-C11 compilers, those use cases could
potentially use memory_order_relaxed loads.  Preferably wrappered in
something that can be typed with fewer characters.  And it could of course
lead to an interesting discussion of what use cases would be required
to justify this change, but what else is new?

> 2. Makes a mess of otherwise readable code.
> 
> 3. Error-prone; ie., easy to overlook in review.

But #2 and #3 are at odds with each other.  It is all too easy to miss a
critically important load or store that has not been flagged in some way.
So #2's readable code can easily be problematic, as the concurrency is
hidden from both the compiler and the poor developer reading the code.

> There is no practical difference between _always_ using READ_ONCE()/WRITE_ONCE()
> (to prevent tearing) and declaring the field volatile.

Actually, yes there is a difference.  If you hold the update-side lock,
you don't have to use READ_ONCE() when reading the variable.  If you
have further excluded readers (for example, at initialization time or
at teardown time), then you don't have to use either READ_ONCE() or
WRITE_ONCE().

> So we've come full-circle from volatile-considered-harmful.

Not really.  We are (hopefully) using volatile for jobs that it can do.
In contrast, in the past people were expecting it to do more than it
reasonably can do.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-16 14:17           ` Peter Hurley
  2016-05-16 17:22             ` Paul E. McKenney
@ 2016-05-16 17:50             ` Peter Zijlstra
  2016-05-17 19:15               ` Peter Hurley
  1 sibling, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2016-05-16 17:50 UTC (permalink / raw)
  To: Peter Hurley
  Cc: paulmck, Waiman Long, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Dave Chinner, Scott J Norton, Douglas Hatch, kcc,
	dvyukov

On Mon, May 16, 2016 at 07:17:42AM -0700, Peter Hurley wrote:

> >> Correct; which is why we should always use {READ,WRITE}_ONCE() for
> >> anything that is used locklessly.
> > 
> > Completely agreed.  Improve readability of code by flagging lockless
> > shared-memory accesses, help checkers better find bugs, and prevent the
> > occasional compiler mischief!
> 
> I think this would be a mistake for 3 reasons:
> 
> 1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
>    of any normally-atomic type (char/int/long/void*), then _every_ access
>    would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
>    of compiler optimization (eg. eliding redundant loads) where it would
>    otherwise be possible.

Should not really be a problem I think; you already have to be very
careful when doing lockless stuff.

> 2. Makes a mess of otherwise readable code.

We have to disagree here; I think code with {READ,WRITE}_ONCE() is more
readable, as their presence is a clear indication something special is
up.

> 3. Error-prone; ie., easy to overlook in review.

lockless stuff is error prone by nature; what's your point?

> There is no practical difference between _always_ using READ_ONCE()/WRITE_ONCE()
> (to prevent tearing) and declaring the field volatile.

There is, declaring the field volatile doesn't make it stand out in the
code while reading at all; it also doesn't allow you to omit the
volatile qualifier in places where it doesn't matter.

The whole; variables aren't volatile, accesses are, thing is still very
much in effect.

> So we've come full-circle from volatile-considered-harmful.

I don't think so; the cases that document talks about are still very
much relevant and volatile should not be used for that.

But yes, our understanding of both the memory model and the
language/compiler has come a long way since then.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-16 17:50             ` Peter Zijlstra
@ 2016-05-17 19:15               ` Peter Hurley
  2016-05-17 19:46                 ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Peter Hurley @ 2016-05-17 19:15 UTC (permalink / raw)
  To: Peter Zijlstra, paulmck
  Cc: Waiman Long, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Dave Chinner, Scott J Norton, Douglas Hatch, kcc,
	dvyukov

On 05/16/2016 10:50 AM, Peter Zijlstra wrote:
> On Mon, May 16, 2016 at 07:17:42AM -0700, Peter Hurley wrote:
> 
>>>> Correct; which is why we should always use {READ,WRITE}_ONCE() for
>>>> anything that is used locklessly.
>>>
>>> Completely agreed.  Improve readability of code by flagging lockless
>>> shared-memory accesses, help checkers better find bugs, and prevent the
>>> occasional compiler mischief!
>>
>> I think this would be a mistake for 3 reasons:
>>
>> 1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
>>    of any normally-atomic type (char/int/long/void*), then _every_ access
>>    would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
>>    of compiler optimization (eg. eliding redundant loads) where it would
>>    otherwise be possible.
> 
> Should not really be a problem I think; you already have to be very
> careful when doing lockless stuff.

I think you may not understand my point here.

Consider normal list processing.

Paul added a READ_ONCE() load of head->next to list_empty().
That's because list_empty() is being used locklessly in lots of places,
not just for RCU lists.

Ok, so the load won't be torn. But the store to ->next can, so Paul
added WRITE_ONCE() to some but not all those list primitives too, even
though those aren't being used locklessly.

Note that the compiler _cannot_ optimize those stores even though
they're being used with locks held; IOW, exactly the situation that
volatile-consider-harmful rails against.

Similarly for the load in list_empty() even if it's used with locks held.

As Paul pointed out in his reply, there is no way to address this right now.

What's really needed is separate, not overloaded semantics:

1. "*If* you load or store this value, do it with single memory access, but
    feel free to optimize (elide load/defer store/hoist load/whatever)."

2. "No, seriously, load/store this value regardless of what you think you
    know." == READ_ONCE/WRITE_ONCE


If we start adding READ_ONCE/WRITE_ONCE to every lockless load/store,
worse code will be generated even though the vast majority of current use
is safe.


>> 2. Makes a mess of otherwise readable code.
> 
> We have to disagree here; I think code with {READ,WRITE}_ONCE() is more
> readable, as their presence is a clear indication something special is
> up.

I think you and Paul may wildly underestimate the scale of lockless
use in the kernel not currently annotated by READ_ONCE()/WRITE_ONCE().

Even in primitives designed for lockless concurrency like
rwsem and seqlock and sched/core, much less higher order code like
ipc/msg.c and vfs.

The majority of this lockless use is safe if the assumption
that loads/stores are performed atomically for char/short/int/long/void*
hold; in other words usage #1 above.


>> 3. Error-prone; ie., easy to overlook in review.
> 
> lockless stuff is error prone by nature; what's your point?

That lockless use is very common, even outside core functionality, and
needlessly splattering READ_ONCE/WRITE_ONCE when the existing code
generation is already safe, will not make it better. And that it will be
no safer by adding READ_ONCE/WRITE_ONCE because plenty of accesses will
be overlooked, like the stores to sem->owner are now.


>> There is no practical difference between _always_ using READ_ONCE()/WRITE_ONCE()
>> (to prevent tearing) and declaring the field volatile.
> 
> There is, declaring the field volatile doesn't make it stand out in the
> code while reading at all; it also doesn't allow you to omit the
> volatile qualifier in places where it doesn't matter.
> 
> The whole; variables aren't volatile, accesses are, thing is still very
> much in effect.

I'm not really seriously arguing for volatile declarations; I'm pointing
out that READ_ONCE()/WRITE_ONCE() has overloaded meaning that is
counter-productive.

For example, you point out that it doesn't allow you to omit the
volatile qualifier but yet we're adding it to underlying primitives
that may or may not be used locklessly.

Guaranteed list_empty() and list_add() are used way more than INIT_LIST_HEAD().


>> So we've come full-circle from volatile-considered-harmful.
> 
> I don't think so; the cases that document talks about are still very
> much relevant and volatile should not be used for that.

The example below is clearly wrong now.

	"Another situation where one might be tempted to use volatile is
	when the processor is busy-waiting on the value of a variable.  The right
	way to perform a busy wait is:

	    while (my_variable != what_i_want)
        	cpu_relax();

	The cpu_relax() call can lower CPU power consumption or yield to a
	hyperthreaded twin processor; it also happens to serve as a compiler
	barrier, so, once again, volatile is unnecessary.  Of course, busy-
	waiting is generally an anti-social act to begin with."


The fact that cpu_relax() is a barrier is immaterial; the load of my_variable
must now be READ_ONCE() to prevent load-tearing.

Likewise, whatever is writing to 'my_variable' must now be WRITE_ONCE().

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-17 19:15               ` Peter Hurley
@ 2016-05-17 19:46                 ` Paul E. McKenney
  2016-05-18 11:05                   ` Peter Zijlstra
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-05-17 19:46 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Peter Zijlstra, Waiman Long, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Jason Low, Dave Chinner, Scott J Norton,
	Douglas Hatch, kcc, dvyukov, dhowells

On Tue, May 17, 2016 at 12:15:20PM -0700, Peter Hurley wrote:
> On 05/16/2016 10:50 AM, Peter Zijlstra wrote:
> > On Mon, May 16, 2016 at 07:17:42AM -0700, Peter Hurley wrote:
> > 
> >>>> Correct; which is why we should always use {READ,WRITE}_ONCE() for
> >>>> anything that is used locklessly.
> >>>
> >>> Completely agreed.  Improve readability of code by flagging lockless
> >>> shared-memory accesses, help checkers better find bugs, and prevent the
> >>> occasional compiler mischief!
> >>
> >> I think this would be a mistake for 3 reasons:
> >>
> >> 1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
> >>    of any normally-atomic type (char/int/long/void*), then _every_ access
> >>    would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
> >>    of compiler optimization (eg. eliding redundant loads) where it would
> >>    otherwise be possible.
> > 
> > Should not really be a problem I think; you already have to be very
> > careful when doing lockless stuff.
> 
> I think you may not understand my point here.
> 
> Consider normal list processing.
> 
> Paul added a READ_ONCE() load of head->next to list_empty().
> That's because list_empty() is being used locklessly in lots of places,
> not just for RCU lists.
> 
> Ok, so the load won't be torn. But the store to ->next can, so Paul
> added WRITE_ONCE() to some but not all those list primitives too, even
> though those aren't being used locklessly.
> 
> Note that the compiler _cannot_ optimize those stores even though
> they're being used with locks held; IOW, exactly the situation that
> volatile-consider-harmful rails against.
> 
> Similarly for the load in list_empty() even if it's used with locks held.
> 
> As Paul pointed out in his reply, there is no way to address this right now.

Actually, if you show a case where this makes a visible system-wide
difference, you could create a set of primitives for #1 below.  Have
a compiler version check, and if it is an old compiler, map them to
READ_ONCE() and WRITE_ONCE(), otherwise as follows, though preferably
with better names:

#define READ_NOTEAR(x) __atomic_load_n(&(x), __ATOMIC_RELAXED)
#define WRITE_NOTEAR(x, v) __atomic_store_n(&(x), (v), __ATOMIC_RELAXED)

The ambiguity between "no tear" and "not ear" should help motivate a
better choice of name.

My guess is that the best justification for adding these will come
from the TINY effort.  My addition of ATOMIC_READ() and ATOMIC_WRITE()
did increase the size.  But perhaps there is a benchmark that can see
the difference in performance, response time, or what have you.

The theoreticians won't like it much, as this will introduce the
theoretical possibility of "out of thin air" values, but then again,
there are other theoreticians working on this problem.

> What's really needed is separate, not overloaded semantics:
> 
> 1. "*If* you load or store this value, do it with single memory access, but
>     feel free to optimize (elide load/defer store/hoist load/whatever)."
> 
> 2. "No, seriously, load/store this value regardless of what you think you
>     know." == READ_ONCE/WRITE_ONCE
> 
> 
> If we start adding READ_ONCE/WRITE_ONCE to every lockless load/store,
> worse code will be generated even though the vast majority of current use
> is safe.

There will indeed be some degradation in theory.  I would be surprised
if it is visible at the system level, aside from tiny kernels.  That said,
I have been surprised before.

> >> 2. Makes a mess of otherwise readable code.
> > 
> > We have to disagree here; I think code with {READ,WRITE}_ONCE() is more
> > readable, as their presence is a clear indication something special is
> > up.
> 
> I think you and Paul may wildly underestimate the scale of lockless
> use in the kernel not currently annotated by READ_ONCE()/WRITE_ONCE().

Believe me, I know that READ_ONCE() and WRITE_ONCE() usage would need
to increase by -at- -least- an order of magnitude in order to cover
the cases.  Many of the might be harmless with current compilers, but
compilers have had a habit of getting more aggressive over time.

> Even in primitives designed for lockless concurrency like
> rwsem and seqlock and sched/core, much less higher order code like
> ipc/msg.c and vfs.
> 
> The majority of this lockless use is safe if the assumption
> that loads/stores are performed atomically for char/short/int/long/void*
> hold; in other words usage #1 above.
> 
> 
> >> 3. Error-prone; ie., easy to overlook in review.
> > 
> > lockless stuff is error prone by nature; what's your point?
> 
> That lockless use is very common, even outside core functionality, and
> needlessly splattering READ_ONCE/WRITE_ONCE when the existing code
> generation is already safe, will not make it better. And that it will be
> no safer by adding READ_ONCE/WRITE_ONCE because plenty of accesses will
> be overlooked, like the stores to sem->owner are now.

I expect mechanical aids will help locate the overlooked accesses.

> >> There is no practical difference between _always_ using READ_ONCE()/WRITE_ONCE()
> >> (to prevent tearing) and declaring the field volatile.
> > 
> > There is, declaring the field volatile doesn't make it stand out in the
> > code while reading at all; it also doesn't allow you to omit the
> > volatile qualifier in places where it doesn't matter.
> > 
> > The whole; variables aren't volatile, accesses are, thing is still very
> > much in effect.
> 
> I'm not really seriously arguing for volatile declarations; I'm pointing
> out that READ_ONCE()/WRITE_ONCE() has overloaded meaning that is
> counter-productive.
> 
> For example, you point out that it doesn't allow you to omit the
> volatile qualifier but yet we're adding it to underlying primitives
> that may or may not be used locklessly.
> 
> Guaranteed list_empty() and list_add() are used way more than INIT_LIST_HEAD().

No argument on usage frequency, just a question on visible effects.

> >> So we've come full-circle from volatile-considered-harmful.
> > 
> > I don't think so; the cases that document talks about are still very
> > much relevant and volatile should not be used for that.
> 
> The example below is clearly wrong now.
> 
> 	"Another situation where one might be tempted to use volatile is
> 	when the processor is busy-waiting on the value of a variable.  The right
> 	way to perform a busy wait is:
> 
> 	    while (my_variable != what_i_want)
>         	cpu_relax();
> 
> 	The cpu_relax() call can lower CPU power consumption or yield to a
> 	hyperthreaded twin processor; it also happens to serve as a compiler
> 	barrier, so, once again, volatile is unnecessary.  Of course, busy-
> 	waiting is generally an anti-social act to begin with."
> 
> 
> The fact that cpu_relax() is a barrier is immaterial; the load of my_variable
> must now be READ_ONCE() to prevent load-tearing.
> 
> Likewise, whatever is writing to 'my_variable' must now be WRITE_ONCE().

Yow!  That document has not been changed since 2010.  I bet that there
is much else in it that needs a bit of help.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-16 17:22             ` Paul E. McKenney
@ 2016-05-17 19:46               ` Peter Hurley
  2016-05-17 19:53                 ` Peter Hurley
  0 siblings, 1 reply; 33+ messages in thread
From: Peter Hurley @ 2016-05-17 19:46 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, Waiman Long, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Jason Low, Dave Chinner, Scott J Norton,
	Douglas Hatch, kcc, dvyukov, dhowells

On 05/16/2016 10:22 AM, Paul E. McKenney wrote:
> On Mon, May 16, 2016 at 07:17:42AM -0700, Peter Hurley wrote:
>> On 05/16/2016 05:17 AM, Paul E. McKenney wrote:
>>> On Mon, May 16, 2016 at 01:09:48PM +0200, Peter Zijlstra wrote:
>>>> On Fri, May 13, 2016 at 10:58:05AM -0700, Peter Hurley wrote:
>>>>>> Note that barrier() and READ_ONCE() have overlapping but not identical
>>>>>> results and the combined use actually makes sense here.
>>>>>>
>>>>>> Yes, a barrier() anywhere in the loop will force a reload of the
>>>>>> variable, _however_ it doesn't force that reload to not suffer from
>>>>>> load tearing.
>>>>>>
>>>>>> Using volatile also forces a reload, but also ensures the load cannot
>>>>>> be torn IFF it is of machine word side and naturally aligned.
>>>>>>
>>>>>> So while the READ_ONCE() here is pointless for forcing the reload;
>>>>>> that's already ensured, we still need to make sure the load isn't torn.
>>>>>
>>>>> If load tearing a naturally aligned pointer is a real code generation
>>>>> possibility then the rcu list code is broken too (which loads ->next
>>>>> directly; cf. list_for_each_entry_rcu() & list_for_each_entry_lockless()).
>>>>>
>>>>> For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
>>>>> that had to do with control dependencies and not load tearing.
>>>>
>>>> Well, Paul is the one who started the whole load/store tearing thing, so
>>>> I suppose he knows what he's doing.
>>>
>>> That had to do with suppressing false positives for one of Dmitry
>>> Vjukov's concurrency checkers.  I suspect that Peter Hurley is right
>>> that continued use of that checker would identify other places needing
>>> READ_ONCE(), but from what I understand that is on hold pending a formal
>>> definition of the Linux-kernel memory model.  (KCC and Dmitry (CCed)
>>> can correct my if I am confused on this point.)
>>>
>>>> That said; its a fairly recent as things go so lots of code hasn't been
>>>> updated yet, and its also a very unlikely thing for a compiler to do;
>>>> since it mostly doesn't make sense to emit multiple instructions where
>>>> one will do, so its not a very high priority thing either.
>>>>
>>>> But from what I understand, the compiler is free to emit all kinds of
>>>> nonsense for !volatile loads/stores.
>>>
>>> That is quite true.  :-/
>>>
>>>>> OTOH, this patch might actually produce store-tearing:
>>>>>
>>>>> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
>>>>> +{
>>>>> +	/*
>>>>> +	 * We check the owner value first to make sure that we will only
>>>>> +	 * do a write to the rwsem cacheline when it is really necessary
>>>>> +	 * to minimize cacheline contention.
>>>>> +	 */
>>>>> +	if (sem->owner != RWSEM_READER_OWNED)
>>>>> +		sem->owner = RWSEM_READER_OWNED;
>>>>> +}
>>>>
>>>> Correct; which is why we should always use {READ,WRITE}_ONCE() for
>>>> anything that is used locklessly.
>>>
>>> Completely agreed.  Improve readability of code by flagging lockless
>>> shared-memory accesses, help checkers better find bugs, and prevent the
>>> occasional compiler mischief!
>>
>> I think this would be a mistake for 3 reasons:
>>
>> 1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
>>    of any normally-atomic type (char/int/long/void*), then _every_ access
>>    would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
>>    of compiler optimization (eg. eliding redundant loads) where it would
>>    otherwise be possible.
> 
> The point about eliding redundant loads is a good one, at least in those
> cases where it is a reasonable optimization.  Should we ever get to a
> point where we no longer use pre-C11 compilers, those use cases could
> potentially use memory_order_relaxed loads.  Preferably wrappered in
> something that can be typed with fewer characters.  And it could of course
> lead to an interesting discussion of what use cases would be required
> to justify this change, but what else is new?

I believe lockless access is quite widespread in the kernel, and this
use was based on the previous assumption that loads/stores to
char/short/int/long/void* are atomic, which is generally safe in the
absence of specific circumstances which may cause load- or store-tearing
(are there others besides immediate stores and packed structures?).

So I think it makes more sense to annotate usage that prevents load-
and store-tearing, separately from the forceably load/store READ_ONCE/
WRITE_ONCE macros.


>> 2. Makes a mess of otherwise readable code.
>>
>> 3. Error-prone; ie., easy to overlook in review.
> 
> But #2 and #3 are at odds with each other.  It is all too easy to miss a
> critically important load or store that has not been flagged in some way.
> So #2's readable code can easily be problematic, as the concurrency is
> hidden from both the compiler and the poor developer reading the code.

Not for the purpose of preventing load- and store-tearing; ie., the
vast majority of lockless use now.


>> There is no practical difference between _always_ using READ_ONCE()/WRITE_ONCE()
>> (to prevent tearing) and declaring the field volatile.
> 
> Actually, yes there is a difference.  If you hold the update-side lock,
> you don't have to use READ_ONCE() when reading the variable.  If you
> have further excluded readers (for example, at initialization time or
> at teardown time), then you don't have to use either READ_ONCE() or
> WRITE_ONCE().

This cuts both ways; on the one hand, you're saying using volatile modifier
doesn't let us control every use case, and on the other hand, we're adding
volatile access to list primitives that we _know_ are both frequently
used and in update-side locks. Where's the win?


>> So we've come full-circle from volatile-considered-harmful.
> 
> Not really.  We are (hopefully) using volatile for jobs that it can do.
> In contrast, in the past people were expecting it to do more than it
> reasonably can do.

Well, I wasn't referring to the never-did-work ideas and more about
the example I quoted from that document about cpu_relax() being a barrier.

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-17 19:46               ` Peter Hurley
@ 2016-05-17 19:53                 ` Peter Hurley
  0 siblings, 0 replies; 33+ messages in thread
From: Peter Hurley @ 2016-05-17 19:53 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, Waiman Long, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Jason Low, Dave Chinner, Scott J Norton,
	Douglas Hatch, kcc, dvyukov, dhowells

Hi Paul,

You can disregard this as I think we're talking about
the same things with the other email thread.

Regards,
Peter Hurley


On 05/17/2016 12:46 PM, Peter Hurley wrote:
> On 05/16/2016 10:22 AM, Paul E. McKenney wrote:
>> On Mon, May 16, 2016 at 07:17:42AM -0700, Peter Hurley wrote:
>>> On 05/16/2016 05:17 AM, Paul E. McKenney wrote:
>>>> On Mon, May 16, 2016 at 01:09:48PM +0200, Peter Zijlstra wrote:
>>>>> On Fri, May 13, 2016 at 10:58:05AM -0700, Peter Hurley wrote:
>>>>>>> Note that barrier() and READ_ONCE() have overlapping but not identical
>>>>>>> results and the combined use actually makes sense here.
>>>>>>>
>>>>>>> Yes, a barrier() anywhere in the loop will force a reload of the
>>>>>>> variable, _however_ it doesn't force that reload to not suffer from
>>>>>>> load tearing.
>>>>>>>
>>>>>>> Using volatile also forces a reload, but also ensures the load cannot
>>>>>>> be torn IFF it is of machine word side and naturally aligned.
>>>>>>>
>>>>>>> So while the READ_ONCE() here is pointless for forcing the reload;
>>>>>>> that's already ensured, we still need to make sure the load isn't torn.
>>>>>>
>>>>>> If load tearing a naturally aligned pointer is a real code generation
>>>>>> possibility then the rcu list code is broken too (which loads ->next
>>>>>> directly; cf. list_for_each_entry_rcu() & list_for_each_entry_lockless()).
>>>>>>
>>>>>> For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
>>>>>> that had to do with control dependencies and not load tearing.
>>>>>
>>>>> Well, Paul is the one who started the whole load/store tearing thing, so
>>>>> I suppose he knows what he's doing.
>>>>
>>>> That had to do with suppressing false positives for one of Dmitry
>>>> Vjukov's concurrency checkers.  I suspect that Peter Hurley is right
>>>> that continued use of that checker would identify other places needing
>>>> READ_ONCE(), but from what I understand that is on hold pending a formal
>>>> definition of the Linux-kernel memory model.  (KCC and Dmitry (CCed)
>>>> can correct my if I am confused on this point.)
>>>>
>>>>> That said; its a fairly recent as things go so lots of code hasn't been
>>>>> updated yet, and its also a very unlikely thing for a compiler to do;
>>>>> since it mostly doesn't make sense to emit multiple instructions where
>>>>> one will do, so its not a very high priority thing either.
>>>>>
>>>>> But from what I understand, the compiler is free to emit all kinds of
>>>>> nonsense for !volatile loads/stores.
>>>>
>>>> That is quite true.  :-/
>>>>
>>>>>> OTOH, this patch might actually produce store-tearing:
>>>>>>
>>>>>> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
>>>>>> +{
>>>>>> +	/*
>>>>>> +	 * We check the owner value first to make sure that we will only
>>>>>> +	 * do a write to the rwsem cacheline when it is really necessary
>>>>>> +	 * to minimize cacheline contention.
>>>>>> +	 */
>>>>>> +	if (sem->owner != RWSEM_READER_OWNED)
>>>>>> +		sem->owner = RWSEM_READER_OWNED;
>>>>>> +}
>>>>>
>>>>> Correct; which is why we should always use {READ,WRITE}_ONCE() for
>>>>> anything that is used locklessly.
>>>>
>>>> Completely agreed.  Improve readability of code by flagging lockless
>>>> shared-memory accesses, help checkers better find bugs, and prevent the
>>>> occasional compiler mischief!
>>>
>>> I think this would be a mistake for 3 reasons:
>>>
>>> 1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
>>>    of any normally-atomic type (char/int/long/void*), then _every_ access
>>>    would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
>>>    of compiler optimization (eg. eliding redundant loads) where it would
>>>    otherwise be possible.
>>
>> The point about eliding redundant loads is a good one, at least in those
>> cases where it is a reasonable optimization.  Should we ever get to a
>> point where we no longer use pre-C11 compilers, those use cases could
>> potentially use memory_order_relaxed loads.  Preferably wrappered in
>> something that can be typed with fewer characters.  And it could of course
>> lead to an interesting discussion of what use cases would be required
>> to justify this change, but what else is new?
> 
> I believe lockless access is quite widespread in the kernel, and this
> use was based on the previous assumption that loads/stores to
> char/short/int/long/void* are atomic, which is generally safe in the
> absence of specific circumstances which may cause load- or store-tearing
> (are there others besides immediate stores and packed structures?).
> 
> So I think it makes more sense to annotate usage that prevents load-
> and store-tearing, separately from the forceably load/store READ_ONCE/
> WRITE_ONCE macros.
> 
> 
>>> 2. Makes a mess of otherwise readable code.
>>>
>>> 3. Error-prone; ie., easy to overlook in review.
>>
>> But #2 and #3 are at odds with each other.  It is all too easy to miss a
>> critically important load or store that has not been flagged in some way.
>> So #2's readable code can easily be problematic, as the concurrency is
>> hidden from both the compiler and the poor developer reading the code.
> 
> Not for the purpose of preventing load- and store-tearing; ie., the
> vast majority of lockless use now.
> 
> 
>>> There is no practical difference between _always_ using READ_ONCE()/WRITE_ONCE()
>>> (to prevent tearing) and declaring the field volatile.
>>
>> Actually, yes there is a difference.  If you hold the update-side lock,
>> you don't have to use READ_ONCE() when reading the variable.  If you
>> have further excluded readers (for example, at initialization time or
>> at teardown time), then you don't have to use either READ_ONCE() or
>> WRITE_ONCE().
> 
> This cuts both ways; on the one hand, you're saying using volatile modifier
> doesn't let us control every use case, and on the other hand, we're adding
> volatile access to list primitives that we _know_ are both frequently
> used and in update-side locks. Where's the win?
> 
> 
>>> So we've come full-circle from volatile-considered-harmful.
>>
>> Not really.  We are (hopefully) using volatile for jobs that it can do.
>> In contrast, in the past people were expecting it to do more than it
>> reasonably can do.
> 
> Well, I wasn't referring to the never-did-work ideas and more about
> the example I quoted from that document about cpu_relax() being a barrier.
> 
> Regards,
> Peter Hurley
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-17 19:46                 ` Paul E. McKenney
@ 2016-05-18 11:05                   ` Peter Zijlstra
  2016-05-18 15:56                     ` Waiman Long
  2016-05-18 17:26                     ` Paul E. McKenney
  0 siblings, 2 replies; 33+ messages in thread
From: Peter Zijlstra @ 2016-05-18 11:05 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Hurley, Waiman Long, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Jason Low, Dave Chinner, Scott J Norton,
	Douglas Hatch, kcc, dvyukov, dhowells

On Tue, May 17, 2016 at 12:46:07PM -0700, Paul E. McKenney wrote:
> Actually, if you show a case where this makes a visible system-wide
> difference, you could create a set of primitives for #1 below.  Have
> a compiler version check, and if it is an old compiler, map them to
> READ_ONCE() and WRITE_ONCE(), otherwise as follows, though preferably
> with better names:
> 
> #define READ_NOTEAR(x) __atomic_load_n(&(x), __ATOMIC_RELAXED)
> #define WRITE_NOTEAR(x, v) __atomic_store_n(&(x), (v), __ATOMIC_RELAXED)
> 
> The ambiguity between "no tear" and "not ear" should help motivate a
> better choice of name.

Alternatively, could we try and talk to our GCC friends to make sure GCC
doesn't tear loads/stores irrespective of what the C language spec
allows?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-18 11:05                   ` Peter Zijlstra
@ 2016-05-18 15:56                     ` Waiman Long
  2016-05-18 17:28                       ` Paul E. McKenney
  2016-05-18 17:26                     ` Paul E. McKenney
  1 sibling, 1 reply; 33+ messages in thread
From: Waiman Long @ 2016-05-18 15:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Peter Hurley, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Jason Low, Dave Chinner, Scott J Norton,
	Douglas Hatch, kcc, dvyukov, dhowells

On 05/18/2016 07:05 AM, Peter Zijlstra wrote:
> On Tue, May 17, 2016 at 12:46:07PM -0700, Paul E. McKenney wrote:
>> Actually, if you show a case where this makes a visible system-wide
>> difference, you could create a set of primitives for #1 below.  Have
>> a compiler version check, and if it is an old compiler, map them to
>> READ_ONCE() and WRITE_ONCE(), otherwise as follows, though preferably
>> with better names:
>>
>> #define READ_NOTEAR(x) __atomic_load_n(&(x), __ATOMIC_RELAXED)
>> #define WRITE_NOTEAR(x, v) __atomic_store_n(&(x), (v), __ATOMIC_RELAXED)
>>
>> The ambiguity between "no tear" and "not ear" should help motivate a
>> better choice of name.
> Alternatively, could we try and talk to our GCC friends to make sure GCC
> doesn't tear loads/stores irrespective of what the C language spec
> allows?
>
>

Maybe the GCC guys can define a tag which can be set in the variable or 
structure field declarations that those variables or field have to be 
read from or written to atomically. This can allow critical data that 
are used by multiple CPUs to be handled correctly while allowing 
compiler the freedom to do what it sees fit for the less critical data. 
This approach is also easier than looking for all the places where the 
data items are accessed and modifying them.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-18 11:05                   ` Peter Zijlstra
  2016-05-18 15:56                     ` Waiman Long
@ 2016-05-18 17:26                     ` Paul E. McKenney
  2016-05-19  9:00                       ` Peter Zijlstra
  1 sibling, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-05-18 17:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Peter Hurley, Waiman Long, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Jason Low, Dave Chinner, Scott J Norton,
	Douglas Hatch, kcc, dvyukov, dhowells

On Wed, May 18, 2016 at 01:05:55PM +0200, Peter Zijlstra wrote:
> On Tue, May 17, 2016 at 12:46:07PM -0700, Paul E. McKenney wrote:
> > Actually, if you show a case where this makes a visible system-wide
> > difference, you could create a set of primitives for #1 below.  Have
> > a compiler version check, and if it is an old compiler, map them to
> > READ_ONCE() and WRITE_ONCE(), otherwise as follows, though preferably
> > with better names:
> > 
> > #define READ_NOTEAR(x) __atomic_load_n(&(x), __ATOMIC_RELAXED)
> > #define WRITE_NOTEAR(x, v) __atomic_store_n(&(x), (v), __ATOMIC_RELAXED)
> > 
> > The ambiguity between "no tear" and "not ear" should help motivate a
> > better choice of name.
> 
> Alternatively, could we try and talk to our GCC friends to make sure GCC
> doesn't tear loads/stores irrespective of what the C language spec
> allows?

Interestingly enough, they used to make that guarantee, but removed it
when C11 showed up.

Me, I would feel better explicitly telling the compiler what I needed.
It is all too easy for bugs to slip in otherwise, especially when the
gcc guys are adding exciting new optimizations.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-18 15:56                     ` Waiman Long
@ 2016-05-18 17:28                       ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2016-05-18 17:28 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Peter Hurley, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Jason Low, Dave Chinner, Scott J Norton,
	Douglas Hatch, kcc, dvyukov, dhowells

On Wed, May 18, 2016 at 11:56:39AM -0400, Waiman Long wrote:
> On 05/18/2016 07:05 AM, Peter Zijlstra wrote:
> >On Tue, May 17, 2016 at 12:46:07PM -0700, Paul E. McKenney wrote:
> >>Actually, if you show a case where this makes a visible system-wide
> >>difference, you could create a set of primitives for #1 below.  Have
> >>a compiler version check, and if it is an old compiler, map them to
> >>READ_ONCE() and WRITE_ONCE(), otherwise as follows, though preferably
> >>with better names:
> >>
> >>#define READ_NOTEAR(x) __atomic_load_n(&(x), __ATOMIC_RELAXED)
> >>#define WRITE_NOTEAR(x, v) __atomic_store_n(&(x), (v), __ATOMIC_RELAXED)
> >>
> >>The ambiguity between "no tear" and "not ear" should help motivate a
> >>better choice of name.
> >Alternatively, could we try and talk to our GCC friends to make sure GCC
> >doesn't tear loads/stores irrespective of what the C language spec
> >allows?
> 
> Maybe the GCC guys can define a tag which can be set in the variable
> or structure field declarations that those variables or field have
> to be read from or written to atomically. This can allow critical
> data that are used by multiple CPUs to be handled correctly while
> allowing compiler the freedom to do what it sees fit for the less
> critical data. This approach is also easier than looking for all the
> places where the data items are accessed and modifying them.

Having such a tag on gcc's internal data structures is what I want,
regardless of exactly how that tag gets there.

The sorts of things that I am especially concerned about are cases
where there is a union or other type-punning involved, and gcc has
the pieces of the desired location already in registers.  Of course,
these pieces came from separate accesses.  We really don't need this
kind of thing tripping us up!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-07  0:20 [PATCH v2] locking/rwsem: Add reader-owned state to the owner field Waiman Long
                   ` (4 preceding siblings ...)
  2016-05-11 22:04 ` Peter Hurley
@ 2016-05-19  1:37 ` Dave Chinner
  2016-05-19  8:32   ` Peter Zijlstra
  2016-05-20 22:56   ` Waiman Long
  5 siblings, 2 replies; 33+ messages in thread
From: Dave Chinner @ 2016-05-19  1:37 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Scott J Norton, Douglas Hatch

On Fri, May 06, 2016 at 08:20:24PM -0400, Waiman Long wrote:
> Currently, it is not possible to determine for sure if a reader
> owns a rwsem by looking at the content of the rwsem data structure.
> This patch adds a new state RWSEM_READER_OWNED to the owner field
> to indicate that readers currently own the lock.

Oh, yes please. This will enable us to get rid of the remaining
mrlock rwsem abstraction we've carried since the days of Irix in
XFS. The only reason the abstraction still exists is that we track
write locks for the purposes of checking for correct inode locking
contexts via ASSERT(xfs_isilocked()) calls....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-19  1:37 ` Dave Chinner
@ 2016-05-19  8:32   ` Peter Zijlstra
  2016-05-20 22:56   ` Waiman Long
  1 sibling, 0 replies; 33+ messages in thread
From: Peter Zijlstra @ 2016-05-19  8:32 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Waiman Long, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Scott J Norton, Douglas Hatch

On Thu, May 19, 2016 at 11:37:53AM +1000, Dave Chinner wrote:
> On Fri, May 06, 2016 at 08:20:24PM -0400, Waiman Long wrote:
> > Currently, it is not possible to determine for sure if a reader
> > owns a rwsem by looking at the content of the rwsem data structure.
> > This patch adds a new state RWSEM_READER_OWNED to the owner field
> > to indicate that readers currently own the lock.
> 
> Oh, yes please. This will enable us to get rid of the remaining
> mrlock rwsem abstraction we've carried since the days of Irix in
> XFS. The only reason the abstraction still exists is that we track
> write locks for the purposes of checking for correct inode locking
> contexts via ASSERT(xfs_isilocked()) calls....

That all seems to live under a DEBUG knob; lockdep could also easily
tell you these things.

---
 include/linux/lockdep.h  | 36 +++++++++++++++++++++++++++++-------
 kernel/locking/lockdep.c | 19 ++++++++++++-------
 2 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index eabe0138eb06..82d6453c4660 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -338,9 +338,21 @@ extern void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 extern void lock_release(struct lockdep_map *lock, int nested,
 			 unsigned long ip);
 
-#define lockdep_is_held(lock)	lock_is_held(&(lock)->dep_map)
+#define LOCKDEP_HELD_ANY	(-1)
+#define LOCKDEP_HELD_EXCLUSIVE	(0)
+#define LOCKDEP_HELD_SHARED	(1)
+#define LOCKDEP_HELD_RECURSIVE	(2)
 
-extern int lock_is_held(struct lockdep_map *lock);
+extern int _lock_is_held(struct lockdep_map *lock, int read);
+
+static inline int lock_is_held(struct lockdep_map *lock)
+{
+	return _lock_is_held(lock, LOCKDEP_HELD_ANY);
+}
+
+#define lockdep_is_held(lock)		_lock_is_held(&(lock)->dep_map, LOCKDEP_HELD_ANY)
+#define lockdep_is_held_exclusive(lock)	_lock_is_held(&(lock)->dep_map, LOCKDEP_HELD_EXCLUSIVE)
+#define lockdep_is_held_shared(lock)	_lock_is_held(&(lock)->dep_map, LOCKDEP_HELD_SHARED)
 
 extern void lock_set_class(struct lockdep_map *lock, const char *name,
 			   struct lock_class_key *key, unsigned int subclass,
@@ -369,12 +381,20 @@ extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie);
 #define lockdep_depth(tsk)	(debug_locks ? (tsk)->lockdep_depth : 0)
 
 #define lockdep_assert_held(l)	do {				\
-		WARN_ON(debug_locks && !lockdep_is_held(l));	\
-	} while (0)
+	WARN_ON(debug_locks && !lockdep_is_held(l));		\
+} while (0)
 
-#define lockdep_assert_held_once(l)	do {				\
-		WARN_ON_ONCE(debug_locks && !lockdep_is_held(l));	\
-	} while (0)
+#define lockdep_assert_held_once(l)	do {			\
+	WARN_ON_ONCE(debug_locks && !lockdep_is_held(l));	\
+} while (0)
+
+#define lockdep_assert_held_exclusive(l) do {			\
+	WARN_ON(debug_locks && !lockdep_is_held_exclusive(l));	\
+} while (0)
+
+#define lockdep_assert_held_shared(l) do {			\
+	WARN_ON(debug_locks && !lockdep_is_held_shared(l));	\
+} while (0)
 
 #define lockdep_recursing(tsk)	((tsk)->lockdep_recursion)
 
@@ -430,6 +450,8 @@ struct lock_class_key { };
 
 #define lockdep_assert_held(l)			do { (void)(l); } while (0)
 #define lockdep_assert_held_once(l)		do { (void)(l); } while (0)
+#define lockdep_assert_held_exclusive(l)	do { (void)(l); } while (0)
+#define lockdep_assert_held_shared(l)		do { (void)(l); } while (0)
 
 #define lockdep_recursing(tsk)			(0)
 
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 81f1a7107c0e..dca5d982b315 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3183,7 +3183,7 @@ print_lock_nested_lock_not_held(struct task_struct *curr,
 	return 0;
 }
 
-static int __lock_is_held(struct lockdep_map *lock);
+static int __lock_is_held(struct lockdep_map *lock, int read);
 
 /*
  * This gets called for every mutex_lock*()/spin_lock*() operation.
@@ -3324,7 +3324,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	}
 	chain_key = iterate_chain_key(chain_key, class_idx);
 
-	if (nest_lock && !__lock_is_held(nest_lock))
+	if (nest_lock && !__lock_is_held(nest_lock, LOCKDEP_HELD_ANY))
 		return print_lock_nested_lock_not_held(curr, hlock, ip);
 
 	if (!validate_chain(curr, lock, hlock, chain_head, chain_key))
@@ -3571,7 +3571,7 @@ __lock_release(struct lockdep_map *lock, int nested, unsigned long ip)
 	return 1;
 }
 
-static int __lock_is_held(struct lockdep_map *lock)
+static int __lock_is_held(struct lockdep_map *lock, int read)
 {
 	struct task_struct *curr = current;
 	int i;
@@ -3579,8 +3579,13 @@ static int __lock_is_held(struct lockdep_map *lock)
 	for (i = 0; i < curr->lockdep_depth; i++) {
 		struct held_lock *hlock = curr->held_locks + i;
 
-		if (match_held_lock(hlock, lock))
+		if (!match_held_lock(hlock, lock))
+			continue;
+
+		if (read == LOCKDEP_HELD_ANY)
 			return 1;
+
+		return read == hlock->read;
 	}
 
 	return 0;
@@ -3764,7 +3769,7 @@ void lock_release(struct lockdep_map *lock, int nested,
 }
 EXPORT_SYMBOL_GPL(lock_release);
 
-int lock_is_held(struct lockdep_map *lock)
+int _lock_is_held(struct lockdep_map *lock, int read)
 {
 	unsigned long flags;
 	int ret = 0;
@@ -3776,13 +3781,13 @@ int lock_is_held(struct lockdep_map *lock)
 	check_flags(flags);
 
 	current->lockdep_recursion = 1;
-	ret = __lock_is_held(lock);
+	ret = __lock_is_held(lock, read);
 	current->lockdep_recursion = 0;
 	raw_local_irq_restore(flags);
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(lock_is_held);
+EXPORT_SYMBOL_GPL(_lock_is_held);
 
 struct pin_cookie lock_pin_lock(struct lockdep_map *lock)
 {

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-18 17:26                     ` Paul E. McKenney
@ 2016-05-19  9:00                       ` Peter Zijlstra
  2016-05-19 13:43                         ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2016-05-19  9:00 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Hurley, Waiman Long, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Jason Low, Dave Chinner, Scott J Norton,
	Douglas Hatch, kcc, dvyukov, dhowells

On Wed, May 18, 2016 at 10:26:06AM -0700, Paul E. McKenney wrote:
> On Wed, May 18, 2016 at 01:05:55PM +0200, Peter Zijlstra wrote:

> > Alternatively, could we try and talk to our GCC friends to make sure GCC
> > doesn't tear loads/stores irrespective of what the C language spec
> > allows?
> 
> Interestingly enough, they used to make that guarantee, but removed it
> when C11 showed up.

Did someone tell them this was a regression and have them fix it? They
can't just change things like this.

> Me, I would feel better explicitly telling the compiler what I needed.
> It is all too easy for bugs to slip in otherwise, especially when the
> gcc guys are adding exciting new optimizations.

GCC guys (as opposed to the language guys) should be far more amenable
to our needs, and I don't think they want to break the kernel any more
than we do.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-19  9:00                       ` Peter Zijlstra
@ 2016-05-19 13:43                         ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2016-05-19 13:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Peter Hurley, Waiman Long, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Jason Low, Dave Chinner, Scott J Norton,
	Douglas Hatch, kcc, dvyukov, dhowells

On Thu, May 19, 2016 at 11:00:13AM +0200, Peter Zijlstra wrote:
> On Wed, May 18, 2016 at 10:26:06AM -0700, Paul E. McKenney wrote:
> > On Wed, May 18, 2016 at 01:05:55PM +0200, Peter Zijlstra wrote:
> 
> > > Alternatively, could we try and talk to our GCC friends to make sure GCC
> > > doesn't tear loads/stores irrespective of what the C language spec
> > > allows?
> > 
> > Interestingly enough, they used to make that guarantee, but removed it
> > when C11 showed up.
> 
> Did someone tell them this was a regression and have them fix it? They
> can't just change things like this.

I did, informally.  I was told that the atomics were to replace them.
I have been bugging them about volatile ever since, given that some
people would dearly like to eliminate volatile from the language.  (I
believe I am making good progress on preventing this, with a lot of help
more recently.)

> > Me, I would feel better explicitly telling the compiler what I needed.
> > It is all too easy for bugs to slip in otherwise, especially when the
> > gcc guys are adding exciting new optimizations.
> 
> GCC guys (as opposed to the language guys) should be far more amenable
> to our needs, and I don't think they want to break the kernel any more
> than we do.

Some are, some aren't.  We should of course cherish the ones who would
like to avoid breaking the kernel.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field
  2016-05-19  1:37 ` Dave Chinner
  2016-05-19  8:32   ` Peter Zijlstra
@ 2016-05-20 22:56   ` Waiman Long
  1 sibling, 0 replies; 33+ messages in thread
From: Waiman Long @ 2016-05-20 22:56 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Jason Low, Scott J Norton, Douglas Hatch

On 05/18/2016 09:37 PM, Dave Chinner wrote:
> On Fri, May 06, 2016 at 08:20:24PM -0400, Waiman Long wrote:
>> Currently, it is not possible to determine for sure if a reader
>> owns a rwsem by looking at the content of the rwsem data structure.
>> This patch adds a new state RWSEM_READER_OWNED to the owner field
>> to indicate that readers currently own the lock.
> Oh, yes please. This will enable us to get rid of the remaining
> mrlock rwsem abstraction we've carried since the days of Irix in
> XFS. The only reason the abstraction still exists is that we track
> write locks for the purposes of checking for correct inode locking
> contexts via ASSERT(xfs_isilocked()) calls....

I think we could add code to export this read/write ownership 
information. However, that will only work when both 
CONFIG_RWSEM_XCHGADD_ALGORITHM and CONFIG_RWSEM_SPIN_ON_OWNER are set. 
So this functionality may not be available in some architectures that 
use CONFIG_RWSEM_GENERIC_SPINLOCK.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2016-05-20 22:57 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-07  0:20 [PATCH v2] locking/rwsem: Add reader-owned state to the owner field Waiman Long
2016-05-07  4:56 ` Ingo Molnar
2016-05-08  3:04   ` Waiman Long
2016-05-09  8:27 ` Peter Zijlstra
2016-05-10  2:24   ` Waiman Long
2016-05-10  7:02     ` Peter Zijlstra
2016-05-09 18:44 ` Jason Low
2016-05-10 13:03 ` Davidlohr Bueso
2016-05-11 22:04 ` Peter Hurley
2016-05-12 20:15   ` Waiman Long
2016-05-12 21:27     ` Peter Hurley
2016-05-12 23:13       ` Waiman Long
2016-05-13 15:07   ` Peter Zijlstra
2016-05-13 17:58     ` Peter Hurley
2016-05-15 14:47       ` Waiman Long
2016-05-16 11:09       ` Peter Zijlstra
2016-05-16 12:17         ` Paul E. McKenney
2016-05-16 14:17           ` Peter Hurley
2016-05-16 17:22             ` Paul E. McKenney
2016-05-17 19:46               ` Peter Hurley
2016-05-17 19:53                 ` Peter Hurley
2016-05-16 17:50             ` Peter Zijlstra
2016-05-17 19:15               ` Peter Hurley
2016-05-17 19:46                 ` Paul E. McKenney
2016-05-18 11:05                   ` Peter Zijlstra
2016-05-18 15:56                     ` Waiman Long
2016-05-18 17:28                       ` Paul E. McKenney
2016-05-18 17:26                     ` Paul E. McKenney
2016-05-19  9:00                       ` Peter Zijlstra
2016-05-19 13:43                         ` Paul E. McKenney
2016-05-19  1:37 ` Dave Chinner
2016-05-19  8:32   ` Peter Zijlstra
2016-05-20 22:56   ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).