[-next,v2] locking/osq_lock: annotate a data race in osq_lock
diff mbox series

Message ID 1581429255-12542-1-git-send-email-cai@lca.pw
State New
Headers show
Series
  • [-next,v2] locking/osq_lock: annotate a data race in osq_lock
Related show

Commit Message

Qian Cai Feb. 11, 2020, 1:54 p.m. UTC
prev->next could be accessed concurrently as noticed by KCSAN,

 write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
  osq_lock+0x25f/0x350
  osq_wait_next at kernel/locking/osq_lock.c:79
  (inlined by) osq_lock at kernel/locking/osq_lock.c:185
  rwsem_optimistic_spin
  <snip>

 read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
  osq_lock+0x196/0x350
  osq_lock at kernel/locking/osq_lock.c:157
  rwsem_optimistic_spin
  <snip>

Since the write only stores NULL to prev->next and the read tests if
prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
shattered, the code is still working correctly. Thus, mark it as an
intentional data race using the data_race() macro.

Signed-off-by: Qian Cai <cai@lca.pw>
---

v2: insert some code comments.

 kernel/locking/osq_lock.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Qian Cai May 8, 2020, 8:59 p.m. UTC | #1
> On Feb 11, 2020, at 8:54 AM, Qian Cai <cai@lca.pw> wrote:
> 
> prev->next could be accessed concurrently as noticed by KCSAN,
> 
> write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
>  osq_lock+0x25f/0x350
>  osq_wait_next at kernel/locking/osq_lock.c:79
>  (inlined by) osq_lock at kernel/locking/osq_lock.c:185
>  rwsem_optimistic_spin
>  <snip>
> 
> read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
>  osq_lock+0x196/0x350
>  osq_lock at kernel/locking/osq_lock.c:157
>  rwsem_optimistic_spin
>  <snip>
> 
> Since the write only stores NULL to prev->next and the read tests if
> prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
> shattered, the code is still working correctly. Thus, mark it as an
> intentional data race using the data_race() macro.
> 
> Signed-off-by: Qian Cai <cai@lca.pw>

Hmm, this patch has been dropped from linux-next from some reasons.

Paul, can you pick this up along with KCSAN fixes?

https://lore.kernel.org/lkml/1581429255-12542-1-git-send-email-cai@lca.pw/

> ---
> 
> v2: insert some code comments.
> 
> kernel/locking/osq_lock.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 1f7734949ac8..f733bcd99e8a 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> 	 */
> 
> 	for (;;) {
> -		if (prev->next == node &&
> +		/*
> +		 * cpu_relax() below implies a compiler barrier which would
> +		 * prevent this comparison being optimized away.
> +		 */
> +		if (data_race(prev->next == node) &&
> 		    cmpxchg(&prev->next, node, NULL) == node)
> 			break;
> 
> -- 
> 1.8.3.1
>
Paul E. McKenney May 9, 2020, 4:33 a.m. UTC | #2
On Fri, May 08, 2020 at 04:59:05PM -0400, Qian Cai wrote:
> 
> 
> > On Feb 11, 2020, at 8:54 AM, Qian Cai <cai@lca.pw> wrote:
> > 
> > prev->next could be accessed concurrently as noticed by KCSAN,
> > 
> > write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
> >  osq_lock+0x25f/0x350
> >  osq_wait_next at kernel/locking/osq_lock.c:79
> >  (inlined by) osq_lock at kernel/locking/osq_lock.c:185
> >  rwsem_optimistic_spin
> >  <snip>
> > 
> > read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
> >  osq_lock+0x196/0x350
> >  osq_lock at kernel/locking/osq_lock.c:157
> >  rwsem_optimistic_spin
> >  <snip>
> > 
> > Since the write only stores NULL to prev->next and the read tests if
> > prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
> > shattered, the code is still working correctly. Thus, mark it as an
> > intentional data race using the data_race() macro.
> > 
> > Signed-off-by: Qian Cai <cai@lca.pw>
> 
> Hmm, this patch has been dropped from linux-next from some reasons.
> 
> Paul, can you pick this up along with KCSAN fixes?
> 
> https://lore.kernel.org/lkml/1581429255-12542-1-git-send-email-cai@lca.pw/

I have queued it on -rcu, but it is too late for v5.8 via the -rcu tree,
so this is v5.9 at the earliest.  Plus I would need an ack from one of
the locking folks.

							Thanx, Paul

> > ---
> > 
> > v2: insert some code comments.
> > 
> > kernel/locking/osq_lock.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > index 1f7734949ac8..f733bcd99e8a 100644
> > --- a/kernel/locking/osq_lock.c
> > +++ b/kernel/locking/osq_lock.c
> > @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > 	 */
> > 
> > 	for (;;) {
> > -		if (prev->next == node &&
> > +		/*
> > +		 * cpu_relax() below implies a compiler barrier which would
> > +		 * prevent this comparison being optimized away.
> > +		 */
> > +		if (data_race(prev->next == node) &&
> > 		    cmpxchg(&prev->next, node, NULL) == node)
> > 			break;
> > 
> > -- 
> > 1.8.3.1
> > 
>
Qian Cai May 9, 2020, 1:01 p.m. UTC | #3
> On May 9, 2020, at 12:33 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Fri, May 08, 2020 at 04:59:05PM -0400, Qian Cai wrote:
>> 
>> 
>>> On Feb 11, 2020, at 8:54 AM, Qian Cai <cai@lca.pw> wrote:
>>> 
>>> prev->next could be accessed concurrently as noticed by KCSAN,
>>> 
>>> write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
>>> osq_lock+0x25f/0x350
>>> osq_wait_next at kernel/locking/osq_lock.c:79
>>> (inlined by) osq_lock at kernel/locking/osq_lock.c:185
>>> rwsem_optimistic_spin
>>> <snip>
>>> 
>>> read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
>>> osq_lock+0x196/0x350
>>> osq_lock at kernel/locking/osq_lock.c:157
>>> rwsem_optimistic_spin
>>> <snip>
>>> 
>>> Since the write only stores NULL to prev->next and the read tests if
>>> prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
>>> shattered, the code is still working correctly. Thus, mark it as an
>>> intentional data race using the data_race() macro.
>>> 
>>> Signed-off-by: Qian Cai <cai@lca.pw>
>> 
>> Hmm, this patch has been dropped from linux-next from some reasons.
>> 
>> Paul, can you pick this up along with KCSAN fixes?
>> 
>> https://lore.kernel.org/lkml/1581429255-12542-1-git-send-email-cai@lca.pw/
> 
> I have queued it on -rcu, but it is too late for v5.8 via the -rcu tree,
> so this is v5.9 at the earliest.  Plus I would need an ack from one of
> the locking folks.

Peter, Will, can you give an ACK? This v2 should incorporate all the feedback from Peter,

https://lore.kernel.org/lkml/20200211124753.GP14914@hirez.programming.kicks-ass.net/

V5.9 is fine. All I care about is it is always in linux-next (so the testing bots won’t trigger this over and over again) and to be in mainline at some point in the future.

> 
> 							Thanx, Paul
> 
>>> ---
>>> 
>>> v2: insert some code comments.
>>> 
>>> kernel/locking/osq_lock.c | 6 +++++-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
>>> index 1f7734949ac8..f733bcd99e8a 100644
>>> --- a/kernel/locking/osq_lock.c
>>> +++ b/kernel/locking/osq_lock.c
>>> @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>>> 	 */
>>> 
>>> 	for (;;) {
>>> -		if (prev->next == node &&
>>> +		/*
>>> +		 * cpu_relax() below implies a compiler barrier which would
>>> +		 * prevent this comparison being optimized away.
>>> +		 */
>>> +		if (data_race(prev->next == node) &&
>>> 		    cmpxchg(&prev->next, node, NULL) == node)
>>> 			break;
>>> 
>>> -- 
>>> 1.8.3.1
Paul E. McKenney May 9, 2020, 4:12 p.m. UTC | #4
On Sat, May 09, 2020 at 09:01:53AM -0400, Qian Cai wrote:
> 
> 
> > On May 9, 2020, at 12:33 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > 
> > On Fri, May 08, 2020 at 04:59:05PM -0400, Qian Cai wrote:
> >> 
> >> 
> >>> On Feb 11, 2020, at 8:54 AM, Qian Cai <cai@lca.pw> wrote:
> >>> 
> >>> prev->next could be accessed concurrently as noticed by KCSAN,
> >>> 
> >>> write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
> >>> osq_lock+0x25f/0x350
> >>> osq_wait_next at kernel/locking/osq_lock.c:79
> >>> (inlined by) osq_lock at kernel/locking/osq_lock.c:185
> >>> rwsem_optimistic_spin
> >>> <snip>
> >>> 
> >>> read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
> >>> osq_lock+0x196/0x350
> >>> osq_lock at kernel/locking/osq_lock.c:157
> >>> rwsem_optimistic_spin
> >>> <snip>
> >>> 
> >>> Since the write only stores NULL to prev->next and the read tests if
> >>> prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
> >>> shattered, the code is still working correctly. Thus, mark it as an
> >>> intentional data race using the data_race() macro.
> >>> 
> >>> Signed-off-by: Qian Cai <cai@lca.pw>
> >> 
> >> Hmm, this patch has been dropped from linux-next from some reasons.
> >> 
> >> Paul, can you pick this up along with KCSAN fixes?
> >> 
> >> https://lore.kernel.org/lkml/1581429255-12542-1-git-send-email-cai@lca.pw/
> > 
> > I have queued it on -rcu, but it is too late for v5.8 via the -rcu tree,
> > so this is v5.9 at the earliest.  Plus I would need an ack from one of
> > the locking folks.
> 
> Peter, Will, can you give an ACK? This v2 should incorporate all the feedback from Peter,
> 
> https://lore.kernel.org/lkml/20200211124753.GP14914@hirez.programming.kicks-ass.net/
> 
> V5.9 is fine. All I care about is it is always in linux-next (so the testing bots won’t trigger this over and over again) and to be in mainline at some point in the future.

Ah, and I forgot to ask.  Why "if (data_race(prev->next == node)" instead
of "if (data_race(prev->next) == node"?

							Thanx, Paul

> >>> ---
> >>> 
> >>> v2: insert some code comments.
> >>> 
> >>> kernel/locking/osq_lock.c | 6 +++++-
> >>> 1 file changed, 5 insertions(+), 1 deletion(-)
> >>> 
> >>> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> >>> index 1f7734949ac8..f733bcd99e8a 100644
> >>> --- a/kernel/locking/osq_lock.c
> >>> +++ b/kernel/locking/osq_lock.c
> >>> @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> >>> 	 */
> >>> 
> >>> 	for (;;) {
> >>> -		if (prev->next == node &&
> >>> +		/*
> >>> +		 * cpu_relax() below implies a compiler barrier which would
> >>> +		 * prevent this comparison being optimized away.
> >>> +		 */
> >>> +		if (data_race(prev->next == node) &&
> >>> 		    cmpxchg(&prev->next, node, NULL) == node)
> >>> 			break;
> >>> 
> >>> -- 
> >>> 1.8.3.1
>
Qian Cai May 9, 2020, 4:53 p.m. UTC | #5
> On May 9, 2020, at 12:12 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> Ah, and I forgot to ask.  Why "if (data_race(prev->next == node)" instead
> of "if (data_race(prev->next) == node"?

I think the one you suggested is slightly better to point out the exact race. Do you want me to resend or you could squash it instead?
Paul E. McKenney May 9, 2020, 9:36 p.m. UTC | #6
On Sat, May 09, 2020 at 12:53:38PM -0400, Qian Cai wrote:
> 
> 
> > On May 9, 2020, at 12:12 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > 
> > Ah, and I forgot to ask.  Why "if (data_race(prev->next == node)" instead
> > of "if (data_race(prev->next) == node"?
> 
> I think the one you suggested is slightly better to point out the exact race. Do you want me to resend or you could squash it instead?

The patch was still at the top of my stack, so I just amended it.  Here
is the updated version.

							Thanx, Paul

------------------------------------------------------------------------

commit 13e69ca01ce1621ce74248bda86cfad47fa5a0fa
Author: Qian Cai <cai@lca.pw>
Date:   Tue Feb 11 08:54:15 2020 -0500

    locking/osq_lock: Annotate a data race in osq_lock
    
    The prev->next pointer can be accessed concurrently as noticed by KCSAN:
    
     write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
      osq_lock+0x25f/0x350
      osq_wait_next at kernel/locking/osq_lock.c:79
      (inlined by) osq_lock at kernel/locking/osq_lock.c:185
      rwsem_optimistic_spin
      <snip>
    
     read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
      osq_lock+0x196/0x350
      osq_lock at kernel/locking/osq_lock.c:157
      rwsem_optimistic_spin
      <snip>
    
    Since the write only stores NULL to prev->next and the read tests if
    prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
    shattered, the code is still working correctly. Thus, mark it as an
    intentional data race using the data_race() macro.
    
    Signed-off-by: Qian Cai <cai@lca.pw>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 1f77349..1de006e 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	 */
 
 	for (;;) {
-		if (prev->next == node &&
+		/*
+		 * cpu_relax() below implies a compiler barrier which would
+		 * prevent this comparison being optimized away.
+		 */
+		if (data_race(prev->next) == node &&
 		    cmpxchg(&prev->next, node, NULL) == node)
 			break;
Will Deacon May 11, 2020, 3:58 p.m. UTC | #7
On Sat, May 09, 2020 at 02:36:54PM -0700, Paul E. McKenney wrote:
> On Sat, May 09, 2020 at 12:53:38PM -0400, Qian Cai wrote:
> > 
> > 
> > > On May 9, 2020, at 12:12 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > 
> > > Ah, and I forgot to ask.  Why "if (data_race(prev->next == node)" instead
> > > of "if (data_race(prev->next) == node"?
> > 
> > I think the one you suggested is slightly better to point out the exact race. Do you want me to resend or you could squash it instead?
> 
> The patch was still at the top of my stack, so I just amended it.  Here
> is the updated version.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 13e69ca01ce1621ce74248bda86cfad47fa5a0fa
> Author: Qian Cai <cai@lca.pw>
> Date:   Tue Feb 11 08:54:15 2020 -0500
> 
>     locking/osq_lock: Annotate a data race in osq_lock
>     
>     The prev->next pointer can be accessed concurrently as noticed by KCSAN:
>     
>      write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
>       osq_lock+0x25f/0x350
>       osq_wait_next at kernel/locking/osq_lock.c:79
>       (inlined by) osq_lock at kernel/locking/osq_lock.c:185
>       rwsem_optimistic_spin
>       <snip>
>     
>      read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
>       osq_lock+0x196/0x350
>       osq_lock at kernel/locking/osq_lock.c:157
>       rwsem_optimistic_spin
>       <snip>
>     
>     Since the write only stores NULL to prev->next and the read tests if
>     prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
>     shattered, the code is still working correctly. Thus, mark it as an
>     intentional data race using the data_race() macro.
>     
>     Signed-off-by: Qian Cai <cai@lca.pw>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 1f77349..1de006e 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	 */
>  
>  	for (;;) {
> -		if (prev->next == node &&
> +		/*
> +		 * cpu_relax() below implies a compiler barrier which would
> +		 * prevent this comparison being optimized away.
> +		 */
> +		if (data_race(prev->next) == node &&
>  		    cmpxchg(&prev->next, node, NULL) == node)
>  			break;

I'm fine with the data_race() placement, but I don't find the comment
very helpful. We assign the result of a READ_ONCE() to 'prev' in the
loop, so I don't think that the cpu_relax() is really relevant.

The reason we don't need READ_ONCE() here is because if we race with
the writer then either we'll go round the loop again after accidentally
thinking prev->next != node, or we'll erroneously attempt the cmpxchg()
because we thought they were equal and that will fail.

Make sense?

Will
Paul E. McKenney May 11, 2020, 4:43 p.m. UTC | #8
On Mon, May 11, 2020 at 04:58:13PM +0100, Will Deacon wrote:
> On Sat, May 09, 2020 at 02:36:54PM -0700, Paul E. McKenney wrote:
> > On Sat, May 09, 2020 at 12:53:38PM -0400, Qian Cai wrote:
> > > 
> > > 
> > > > On May 9, 2020, at 12:12 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > 
> > > > Ah, and I forgot to ask.  Why "if (data_race(prev->next == node)" instead
> > > > of "if (data_race(prev->next) == node"?
> > > 
> > > I think the one you suggested is slightly better to point out the exact race. Do you want me to resend or you could squash it instead?
> > 
> > The patch was still at the top of my stack, so I just amended it.  Here
> > is the updated version.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 13e69ca01ce1621ce74248bda86cfad47fa5a0fa
> > Author: Qian Cai <cai@lca.pw>
> > Date:   Tue Feb 11 08:54:15 2020 -0500
> > 
> >     locking/osq_lock: Annotate a data race in osq_lock
> >     
> >     The prev->next pointer can be accessed concurrently as noticed by KCSAN:
> >     
> >      write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
> >       osq_lock+0x25f/0x350
> >       osq_wait_next at kernel/locking/osq_lock.c:79
> >       (inlined by) osq_lock at kernel/locking/osq_lock.c:185
> >       rwsem_optimistic_spin
> >       <snip>
> >     
> >      read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
> >       osq_lock+0x196/0x350
> >       osq_lock at kernel/locking/osq_lock.c:157
> >       rwsem_optimistic_spin
> >       <snip>
> >     
> >     Since the write only stores NULL to prev->next and the read tests if
> >     prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
> >     shattered, the code is still working correctly. Thus, mark it as an
> >     intentional data race using the data_race() macro.
> >     
> >     Signed-off-by: Qian Cai <cai@lca.pw>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > 
> > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > index 1f77349..1de006e 100644
> > --- a/kernel/locking/osq_lock.c
> > +++ b/kernel/locking/osq_lock.c
> > @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> >  	 */
> >  
> >  	for (;;) {
> > -		if (prev->next == node &&
> > +		/*
> > +		 * cpu_relax() below implies a compiler barrier which would
> > +		 * prevent this comparison being optimized away.
> > +		 */
> > +		if (data_race(prev->next) == node &&
> >  		    cmpxchg(&prev->next, node, NULL) == node)
> >  			break;
> 
> I'm fine with the data_race() placement, but I don't find the comment
> very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> loop, so I don't think that the cpu_relax() is really relevant.

Suppose that the compiler loaded a value that was not equal to "node".
In that case, the cmpxchg() won't happen, so something else must force
the compiler to do the reload in order to avoid an infinite loop, right?
Or am I missing something here?

							Thanx, Paul

> The reason we don't need READ_ONCE() here is because if we race with
> the writer then either we'll go round the loop again after accidentally
> thinking prev->next != node, or we'll erroneously attempt the cmpxchg()
> because we thought they were equal and that will fail.
> 
> Make sense?
> 
> Will
Qian Cai May 11, 2020, 4:44 p.m. UTC | #9
> On May 11, 2020, at 11:58 AM, Will Deacon <will@kernel.org> wrote:
> 
> I'm fine with the data_race() placement, but I don't find the comment
> very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> loop, so I don't think that the cpu_relax() is really relevant.
> 
> The reason we don't need READ_ONCE() here is because if we race with
> the writer then either we'll go round the loop again after accidentally
> thinking prev->next != node, or we'll erroneously attempt the cmpxchg()
> because we thought they were equal and that will fail.
> 
> Make sense?

I think the significant concern from the previous reviews was if compilers could prove that prev->next == node was always true because it had no knowledge of the concurrency, and then took out the whole if statement away resulting in an infinite loop.

The comment tried to explain that the cpu_relax() would save us from the infinite loop in theory here.
Will Deacon May 11, 2020, 4:52 p.m. UTC | #10
On Mon, May 11, 2020 at 09:43:19AM -0700, Paul E. McKenney wrote:
> On Mon, May 11, 2020 at 04:58:13PM +0100, Will Deacon wrote:
> > On Sat, May 09, 2020 at 02:36:54PM -0700, Paul E. McKenney wrote:
> > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > > index 1f77349..1de006e 100644
> > > --- a/kernel/locking/osq_lock.c
> > > +++ b/kernel/locking/osq_lock.c
> > > @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > >  	 */
> > >  
> > >  	for (;;) {
> > > -		if (prev->next == node &&
> > > +		/*
> > > +		 * cpu_relax() below implies a compiler barrier which would
> > > +		 * prevent this comparison being optimized away.
> > > +		 */
> > > +		if (data_race(prev->next) == node &&
> > >  		    cmpxchg(&prev->next, node, NULL) == node)
> > >  			break;
> > 
> > I'm fine with the data_race() placement, but I don't find the comment
> > very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> > loop, so I don't think that the cpu_relax() is really relevant.
> 
> Suppose that the compiler loaded a value that was not equal to "node".
> In that case, the cmpxchg() won't happen, so something else must force
> the compiler to do the reload in order to avoid an infinite loop, right?
> Or am I missing something here?

Then we just go round the loop and reload prev:

	prev = READ_ONCE(node->prev);

which should be enough to stop the compiler, no?

Will
Will Deacon May 11, 2020, 4:54 p.m. UTC | #11
On Mon, May 11, 2020 at 12:44:26PM -0400, Qian Cai wrote:
> 
> 
> > On May 11, 2020, at 11:58 AM, Will Deacon <will@kernel.org> wrote:
> > 
> > I'm fine with the data_race() placement, but I don't find the comment
> > very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> > loop, so I don't think that the cpu_relax() is really relevant.
> > 
> > The reason we don't need READ_ONCE() here is because if we race with
> > the writer then either we'll go round the loop again after accidentally
> > thinking prev->next != node, or we'll erroneously attempt the cmpxchg()
> > because we thought they were equal and that will fail.
> > 
> > Make sense?
> 
> I think the significant concern from the previous reviews was if compilers
> could prove that prev->next == node was always true because it had no
> knowledge of the concurrency, and then took out the whole if statement
> away resulting in an infinite loop.

Hmm, I don't see how it can remove the cmpxchg(). Do you have a link to that
discussion, please?

Will
Qian Cai May 11, 2020, 5:10 p.m. UTC | #12
> On May 11, 2020, at 12:54 PM, Will Deacon <will@kernel.org> wrote:
> 
> Hmm, I don't see how it can remove the cmpxchg(). Do you have a link to that
> discussion, please?

lore.kernel.org/lkml/20200211124753.GP14914@hirez.programming.kicks-ass.net

Correction — if compilers could prove ”prev->next != node” is always true, that cmpxchg() would not run. cpu_relax() should be sufficient to keep that “if statement” been optimized away in any case.
Paul E. McKenney May 11, 2020, 5:29 p.m. UTC | #13
On Mon, May 11, 2020 at 05:52:17PM +0100, Will Deacon wrote:
> On Mon, May 11, 2020 at 09:43:19AM -0700, Paul E. McKenney wrote:
> > On Mon, May 11, 2020 at 04:58:13PM +0100, Will Deacon wrote:
> > > On Sat, May 09, 2020 at 02:36:54PM -0700, Paul E. McKenney wrote:
> > > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > > > index 1f77349..1de006e 100644
> > > > --- a/kernel/locking/osq_lock.c
> > > > +++ b/kernel/locking/osq_lock.c
> > > > @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > > >  	 */
> > > >  
> > > >  	for (;;) {
> > > > -		if (prev->next == node &&
> > > > +		/*
> > > > +		 * cpu_relax() below implies a compiler barrier which would
> > > > +		 * prevent this comparison being optimized away.
> > > > +		 */
> > > > +		if (data_race(prev->next) == node &&
> > > >  		    cmpxchg(&prev->next, node, NULL) == node)
> > > >  			break;
> > > 
> > > I'm fine with the data_race() placement, but I don't find the comment
> > > very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> > > loop, so I don't think that the cpu_relax() is really relevant.
> > 
> > Suppose that the compiler loaded a value that was not equal to "node".
> > In that case, the cmpxchg() won't happen, so something else must force
> > the compiler to do the reload in order to avoid an infinite loop, right?
> > Or am I missing something here?
> 
> Then we just go round the loop and reload prev:
> 
> 	prev = READ_ONCE(node->prev);
> 
> which should be enough to stop the compiler, no?

Yes, that would also work.  Either have the cpu_relax() or a barrier()
or whatever on the one hand, or, as you say, turn the data_race() into
a READ_ONCE().  I personally prefer the READ_ONCE() myself, unless that
would undesirably suppress other KCSAN warnings.

							Thanx, Paul
Will Deacon May 11, 2020, 5:34 p.m. UTC | #14
On Mon, May 11, 2020 at 10:29:18AM -0700, Paul E. McKenney wrote:
> On Mon, May 11, 2020 at 05:52:17PM +0100, Will Deacon wrote:
> > On Mon, May 11, 2020 at 09:43:19AM -0700, Paul E. McKenney wrote:
> > > On Mon, May 11, 2020 at 04:58:13PM +0100, Will Deacon wrote:
> > > > On Sat, May 09, 2020 at 02:36:54PM -0700, Paul E. McKenney wrote:
> > > > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > > > > index 1f77349..1de006e 100644
> > > > > --- a/kernel/locking/osq_lock.c
> > > > > +++ b/kernel/locking/osq_lock.c
> > > > > @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > > > >  	 */
> > > > >  
> > > > >  	for (;;) {
> > > > > -		if (prev->next == node &&
> > > > > +		/*
> > > > > +		 * cpu_relax() below implies a compiler barrier which would
> > > > > +		 * prevent this comparison being optimized away.
> > > > > +		 */
> > > > > +		if (data_race(prev->next) == node &&
> > > > >  		    cmpxchg(&prev->next, node, NULL) == node)
> > > > >  			break;
> > > > 
> > > > I'm fine with the data_race() placement, but I don't find the comment
> > > > very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> > > > loop, so I don't think that the cpu_relax() is really relevant.
> > > 
> > > Suppose that the compiler loaded a value that was not equal to "node".
> > > In that case, the cmpxchg() won't happen, so something else must force
> > > the compiler to do the reload in order to avoid an infinite loop, right?
> > > Or am I missing something here?
> > 
> > Then we just go round the loop and reload prev:
> > 
> > 	prev = READ_ONCE(node->prev);
> > 
> > which should be enough to stop the compiler, no?
> 
> Yes, that would also work.  Either have the cpu_relax() or a barrier()
> or whatever on the one hand, or, as you say, turn the data_race() into
> a READ_ONCE().  I personally prefer the READ_ONCE() myself, unless that
> would undesirably suppress other KCSAN warnings.

No, I mean here is the code after this patch is applied:

	for (;;) {
		if (data_race(prev->next) == node &&
		    cmpxchg(&prev->next, node, NULL) == node)
			break;

		/*
		 * We can only fail the cmpxchg() racing against an unlock(),
		 * in which case we should observe @node->locked becomming
		 * true.
		 */
		if (smp_load_acquire(&node->locked))
			return true;

		cpu_relax();

		/*
		 * Or we race against a concurrent unqueue()'s step-B, in which
		 * case its step-C will write us a new @node->prev pointer.
		 */
		prev = READ_ONCE(node->prev);
	}

I'm saying that this READ_ONCE at the end of the loop should be sufficient
to stop the compiler making value assumptions about prev->next. Do you
agree?

Will
Paul E. McKenney May 11, 2020, 6:07 p.m. UTC | #15
On Mon, May 11, 2020 at 06:34:13PM +0100, Will Deacon wrote:
> On Mon, May 11, 2020 at 10:29:18AM -0700, Paul E. McKenney wrote:
> > On Mon, May 11, 2020 at 05:52:17PM +0100, Will Deacon wrote:
> > > On Mon, May 11, 2020 at 09:43:19AM -0700, Paul E. McKenney wrote:
> > > > On Mon, May 11, 2020 at 04:58:13PM +0100, Will Deacon wrote:
> > > > > On Sat, May 09, 2020 at 02:36:54PM -0700, Paul E. McKenney wrote:
> > > > > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > > > > > index 1f77349..1de006e 100644
> > > > > > --- a/kernel/locking/osq_lock.c
> > > > > > +++ b/kernel/locking/osq_lock.c
> > > > > > @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > > > > >  	 */
> > > > > >  
> > > > > >  	for (;;) {
> > > > > > -		if (prev->next == node &&
> > > > > > +		/*
> > > > > > +		 * cpu_relax() below implies a compiler barrier which would
> > > > > > +		 * prevent this comparison being optimized away.
> > > > > > +		 */
> > > > > > +		if (data_race(prev->next) == node &&
> > > > > >  		    cmpxchg(&prev->next, node, NULL) == node)
> > > > > >  			break;
> > > > > 
> > > > > I'm fine with the data_race() placement, but I don't find the comment
> > > > > very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> > > > > loop, so I don't think that the cpu_relax() is really relevant.
> > > > 
> > > > Suppose that the compiler loaded a value that was not equal to "node".
> > > > In that case, the cmpxchg() won't happen, so something else must force
> > > > the compiler to do the reload in order to avoid an infinite loop, right?
> > > > Or am I missing something here?
> > > 
> > > Then we just go round the loop and reload prev:
> > > 
> > > 	prev = READ_ONCE(node->prev);
> > > 
> > > which should be enough to stop the compiler, no?
> > 
> > Yes, that would also work.  Either have the cpu_relax() or a barrier()
> > or whatever on the one hand, or, as you say, turn the data_race() into
> > a READ_ONCE().  I personally prefer the READ_ONCE() myself, unless that
> > would undesirably suppress other KCSAN warnings.
> 
> No, I mean here is the code after this patch is applied:
> 
> 	for (;;) {
> 		if (data_race(prev->next) == node &&
> 		    cmpxchg(&prev->next, node, NULL) == node)
> 			break;
> 
> 		/*
> 		 * We can only fail the cmpxchg() racing against an unlock(),
> 		 * in which case we should observe @node->locked becomming
> 		 * true.
> 		 */
> 		if (smp_load_acquire(&node->locked))
> 			return true;
> 
> 		cpu_relax();
> 
> 		/*
> 		 * Or we race against a concurrent unqueue()'s step-B, in which
> 		 * case its step-C will write us a new @node->prev pointer.
> 		 */
> 		prev = READ_ONCE(node->prev);
> 	}
> 
> I'm saying that this READ_ONCE at the end of the loop should be sufficient
> to stop the compiler making value assumptions about prev->next. Do you
> agree?

Good point, and I would certainly hope so!

							Thanx, Paul

Patch
diff mbox series

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 1f7734949ac8..f733bcd99e8a 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -154,7 +154,11 @@  bool osq_lock(struct optimistic_spin_queue *lock)
 	 */
 
 	for (;;) {
-		if (prev->next == node &&
+		/*
+		 * cpu_relax() below implies a compiler barrier which would
+		 * prevent this comparison being optimized away.
+		 */
+		if (data_race(prev->next == node) &&
 		    cmpxchg(&prev->next, node, NULL) == node)
 			break;