locking/osq_lock: fix a data race in osq_wait_next
diff mbox series

Message ID 20200122163857.4605-1-cai@lca.pw
State New
Headers show
Series
  • locking/osq_lock: fix a data race in osq_wait_next
Related show

Commit Message

Qian Cai Jan. 22, 2020, 4:38 p.m. UTC
KCSAN complains,

 write (marked) to 0xffff941ca3b3be00 of 8 bytes by task 670 on cpu 6:
  osq_lock+0x24c/0x340
  __mutex_lock+0x277/0xd20
  mutex_lock_nested+0x31/0x40
  memcg_create_kmem_cache+0x2e/0x190
  memcg_kmem_cache_create_func+0x40/0x80
  process_one_work+0x54c/0xbe0
  worker_thread+0x80/0x650
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

 read to 0xffff941ca3b3be00 of 8 bytes by task 703 on cpu 44:
  osq_lock+0x18e/0x340
  __mutex_lock+0x277/0xd20
  mutex_lock_nested+0x31/0x40
  memcg_create_kmem_cache+0x2e/0x190
  memcg_kmem_cache_create_func+0x40/0x80
  process_one_work+0x54c/0xbe0
  worker_thread+0x80/0x650
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

which points to those lines in osq_wait_next(),

  next = xchg(&node->next, NULL);
  if (next)
	break;

Since only the read is outside of critical sections, fixed it by adding
a READ_ONCE().

Signed-off-by: Qian Cai <cai@lca.pw>
---
 kernel/locking/osq_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Will Deacon Jan. 22, 2020, 4:59 p.m. UTC | #1
On Wed, Jan 22, 2020 at 11:38:57AM -0500, Qian Cai wrote:
> KCSAN complains,
> 
>  write (marked) to 0xffff941ca3b3be00 of 8 bytes by task 670 on cpu 6:
>   osq_lock+0x24c/0x340
>   __mutex_lock+0x277/0xd20
>   mutex_lock_nested+0x31/0x40
>   memcg_create_kmem_cache+0x2e/0x190
>   memcg_kmem_cache_create_func+0x40/0x80
>   process_one_work+0x54c/0xbe0
>   worker_thread+0x80/0x650
>   kthread+0x1e0/0x200
>   ret_from_fork+0x27/0x50
> 
>  read to 0xffff941ca3b3be00 of 8 bytes by task 703 on cpu 44:
>   osq_lock+0x18e/0x340
>   __mutex_lock+0x277/0xd20
>   mutex_lock_nested+0x31/0x40
>   memcg_create_kmem_cache+0x2e/0x190
>   memcg_kmem_cache_create_func+0x40/0x80
>   process_one_work+0x54c/0xbe0
>   worker_thread+0x80/0x650
>   kthread+0x1e0/0x200
>   ret_from_fork+0x27/0x50
> 
> which points to those lines in osq_wait_next(),
> 
>   next = xchg(&node->next, NULL);
>   if (next)
> 	break;
> 
> Since only the read is outside of critical sections, fixed it by adding
> a READ_ONCE().
> 
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>  kernel/locking/osq_lock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 6ef600aa0f47..8f565165019a 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -77,7 +77,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
>  		 */
>  		if (node->next) {
>  			next = xchg(&node->next, NULL);
> -			if (next)
> +			if (READ_ONCE(next))
>  				break;
>  		}

I don't understand this; 'next' is a local variable.

Not keen on the onslaught of random "add a READ_ONCE() to shut the
sanitiser up" patches we're going to get from kcsan :(

Will
Qian Cai Jan. 22, 2020, 5:08 p.m. UTC | #2
> On Jan 22, 2020, at 11:59 AM, Will Deacon <will@kernel.org> wrote:
> 
> I don't understand this; 'next' is a local variable.
> 
> Not keen on the onslaught of random "add a READ_ONCE() to shut the
> sanitiser up" patches we're going to get from kcsan :(

My fault. I suspect it is node->next. I’ll do a bit more testing to confirm.
Peter Zijlstra Jan. 22, 2020, 5:09 p.m. UTC | #3
On Wed, Jan 22, 2020 at 11:38:57AM -0500, Qian Cai wrote:
> KCSAN complains,
> 
>  write (marked) to 0xffff941ca3b3be00 of 8 bytes by task 670 on cpu 6:
>   osq_lock+0x24c/0x340
>   __mutex_lock+0x277/0xd20
>   mutex_lock_nested+0x31/0x40
>   memcg_create_kmem_cache+0x2e/0x190
>   memcg_kmem_cache_create_func+0x40/0x80
>   process_one_work+0x54c/0xbe0
>   worker_thread+0x80/0x650
>   kthread+0x1e0/0x200
>   ret_from_fork+0x27/0x50
> 
>  read to 0xffff941ca3b3be00 of 8 bytes by task 703 on cpu 44:
>   osq_lock+0x18e/0x340
>   __mutex_lock+0x277/0xd20
>   mutex_lock_nested+0x31/0x40
>   memcg_create_kmem_cache+0x2e/0x190
>   memcg_kmem_cache_create_func+0x40/0x80
>   process_one_work+0x54c/0xbe0
>   worker_thread+0x80/0x650
>   kthread+0x1e0/0x200
>   ret_from_fork+0x27/0x50

That's useless gibberish, at the very least run it through some decode
so we get line numbers.

> which points to those lines in osq_wait_next(),
> 
>   next = xchg(&node->next, NULL);
>   if (next)
> 	break;
> 
> Since only the read is outside of critical sections, fixed it by adding
> a READ_ONCE().

What?!?! Did you actually read what you wrote?

Also, you have to stop calling things fixes unless you can prove (and
explain) there is an actual problem.

> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>  kernel/locking/osq_lock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 6ef600aa0f47..8f565165019a 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -77,7 +77,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
>  		 */
>  		if (node->next) {
>  			next = xchg(&node->next, NULL);
> -			if (next)
> +			if (READ_ONCE(next))
>  				break;
>  		}
>  

This seems to suggest you ought to maybe brush up on your C skills.
Marco Elver Jan. 22, 2020, 10:38 p.m. UTC | #4
On Wed, 22 Jan 2020, Qian Cai wrote:

> 
> 
> > On Jan 22, 2020, at 11:59 AM, Will Deacon <will@kernel.org> wrote:
> > 
> > I don't understand this; 'next' is a local variable.
> > 
> > Not keen on the onslaught of random "add a READ_ONCE() to shut the
> > sanitiser up" patches we're going to get from kcsan :(
> 
> My fault. I suspect it is node->next. I’ll do a bit more testing to confirm.

If possible, decode and get the line numbers. I have observed a data
race in osq_lock before, however, this is the only one I have recently
seen in osq_lock:

read to 0xffff88812c12d3d4 of 4 bytes by task 23304 on cpu 0:
 osq_lock+0x170/0x2f0 kernel/locking/osq_lock.c:143

	while (!READ_ONCE(node->locked)) {
		/*
		 * If we need to reschedule bail... so we can block.
		 * Use vcpu_is_preempted() to avoid waiting for a preempted
		 * lock holder:
		 */
-->		if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
			goto unqueue;

		cpu_relax();
	}

where

	static inline int node_cpu(struct optimistic_spin_node *node)
	{
-->		return node->cpu - 1;
	}


write to 0xffff88812c12d3d4 of 4 bytes by task 23334 on cpu 1:
 osq_lock+0x89/0x2f0 kernel/locking/osq_lock.c:99

	bool osq_lock(struct optimistic_spin_queue *lock)
	{
		struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
		struct optimistic_spin_node *prev, *next;
		int curr = encode_cpu(smp_processor_id());
		int old;

		node->locked = 0;
		node->next = NULL;
-->		node->cpu = curr;


Thanks,
-- Marco
Qian Cai Jan. 22, 2020, 11:54 p.m. UTC | #5
> On Jan 22, 2020, at 5:38 PM, Marco Elver <elver@google.com> wrote:
> 
> 
> 
> On Wed, 22 Jan 2020, Qian Cai wrote:
> 
>> 
>> 
>>> On Jan 22, 2020, at 11:59 AM, Will Deacon <will@kernel.org> wrote:
>>> 
>>> I don't understand this; 'next' is a local variable.
>>> 
>>> Not keen on the onslaught of random "add a READ_ONCE() to shut the
>>> sanitiser up" patches we're going to get from kcsan :(
>> 
>> My fault. I suspect it is node->next. I’ll do a bit more testing to confirm.
> 
> If possible, decode and get the line numbers. I have observed a data

[  667.817131] Reported by Kernel Concurrency Sanitizer on:
[  667.823200] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W    L    5.5.0-rc7-next-20200121+ #9
[  667.832839] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[  667.842132] ==================================================================
[  672.299421] ==================================================================
[  672.307449] BUG: KCSAN: data-race in osq_lock / osq_lock

[  672.315741] write (marked) to 0xffff8f613013be00 of 8 bytes by task 971 on cpu 59:
[  672.324085]  osq_lock+0x2fb/0x340 kernel/locking/osq_lock.c:200
[  672.328149]  __mutex_lock+0x277/0xd20 kernel/locking/mutex.c:657
[  672.332561]  mutex_lock_nested+0x31/0x40 kernel/locking/mutex.c:1118
[  672.337236]  memcg_create_kmem_cache+0x2e/0x190 mm/slab_common.c:659
[  672.342534]  memcg_kmem_cache_create_func+0x40/0x80
[  672.348177]  process_one_work+0x54c/0xbe0
[  672.352940]  worker_thread+0x80/0x650
[  672.357351]  kthread+0x1e0/0x200
[  672.361324]  ret_from_fork+0x27/0x50

[  672.367875] read to 0xffff8f613013be00 of 8 bytes by task 708 on cpu 50:
[  672.375345]  osq_lock+0x234/0x340 kernel/locking/osq_lock.c:78
[  672.379431]  __mutex_lock+0x277/0xd20 kernel/locking/mutex.c:657
[  672.383862]  mutex_lock_nested+0x31/0x40 kernel/locking/mutex.c:1118
[  672.388537]  memcg_create_kmem_cache+0x2e/0x190 mm/slab_common.c:659
[  672.393824]  memcg_kmem_cache_create_func+0x40/0x80
[  672.399461]  process_one_work+0x54c/0xbe0
[  672.404229]  worker_thread+0x80/0x650
[  672.408640]  kthread+0x1e0/0x200
[  672.412613]  ret_from_fork+0x27/0x50

This?

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 1f7734949ac8..832e87966dcf 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
                 * wait for either @lock to point to us, through its Step-B, or
                 * wait for a new @node->next from its Step-C.
                 */
-               if (node->next) {
+               if (READ_ONCE(node->next)) {
                        next = xchg(&node->next, NULL);
                        if (next)
                                break;

> race in osq_lock before, however, this is the only one I have recently
> seen in osq_lock:
> 
> read to 0xffff88812c12d3d4 of 4 bytes by task 23304 on cpu 0:
>  osq_lock+0x170/0x2f0 kernel/locking/osq_lock.c:143
> 
> 	while (!READ_ONCE(node->locked)) {
> 		/*
> 		 * If we need to reschedule bail... so we can block.
> 		 * Use vcpu_is_preempted() to avoid waiting for a preempted
> 		 * lock holder:
> 		 */
> -->		if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
> 			goto unqueue;
> 
> 		cpu_relax();
> 	}
> 
> where
> 
> 	static inline int node_cpu(struct optimistic_spin_node *node)
> 	{
> -->		return node->cpu - 1;
> 	}
> 
> 
> write to 0xffff88812c12d3d4 of 4 bytes by task 23334 on cpu 1:
> osq_lock+0x89/0x2f0 kernel/locking/osq_lock.c:99
> 
> 	bool osq_lock(struct optimistic_spin_queue *lock)
> 	{
> 		struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> 		struct optimistic_spin_node *prev, *next;
> 		int curr = encode_cpu(smp_processor_id());
> 		int old;
> 
> 		node->locked = 0;
> 		node->next = NULL;
> -->		node->cpu = curr;
> 
> 
> Thanks,
> -- Marco
Peter Zijlstra Jan. 23, 2020, 9:36 a.m. UTC | #6
On Wed, Jan 22, 2020 at 11:38:51PM +0100, Marco Elver wrote:

> If possible, decode and get the line numbers. I have observed a data
> race in osq_lock before, however, this is the only one I have recently
> seen in osq_lock:
> 
> read to 0xffff88812c12d3d4 of 4 bytes by task 23304 on cpu 0:
>  osq_lock+0x170/0x2f0 kernel/locking/osq_lock.c:143
> 
> 	while (!READ_ONCE(node->locked)) {
> 		/*
> 		 * If we need to reschedule bail... so we can block.
> 		 * Use vcpu_is_preempted() to avoid waiting for a preempted
> 		 * lock holder:
> 		 */
> -->		if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
> 			goto unqueue;
> 
> 		cpu_relax();
> 	}
> 
> where
> 
> 	static inline int node_cpu(struct optimistic_spin_node *node)
> 	{
> -->		return node->cpu - 1;
> 	}
> 
> 
> write to 0xffff88812c12d3d4 of 4 bytes by task 23334 on cpu 1:
>  osq_lock+0x89/0x2f0 kernel/locking/osq_lock.c:99
> 
> 	bool osq_lock(struct optimistic_spin_queue *lock)
> 	{
> 		struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> 		struct optimistic_spin_node *prev, *next;
> 		int curr = encode_cpu(smp_processor_id());
> 		int old;
> 
> 		node->locked = 0;
> 		node->next = NULL;
> -->		node->cpu = curr;
> 

Yeah, that's impossible. This store happens before the node is
published, so no matter how the load in node_cpu() is shattered, it must
observe the right value.
Peter Zijlstra Jan. 23, 2020, 9:39 a.m. UTC | #7
On Wed, Jan 22, 2020 at 06:54:43PM -0500, Qian Cai wrote:
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 1f7734949ac8..832e87966dcf 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
>                  * wait for either @lock to point to us, through its Step-B, or
>                  * wait for a new @node->next from its Step-C.
>                  */
> -               if (node->next) {
> +               if (READ_ONCE(node->next)) {
>                         next = xchg(&node->next, NULL);
>                         if (next)
>                                 break;

This could possibly trigger the warning, but is a false positive. The
above doesn't fix anything in that even if that load is shattered the
code will function correctly -- it checks for any !0 value, any byte
composite that is !0 is sufficient.

This is in fact something KCSAN compiler infrastructure could deduce.
Qian Cai Jan. 28, 2020, 3:11 a.m. UTC | #8
> On Jan 23, 2020, at 4:39 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Wed, Jan 22, 2020 at 06:54:43PM -0500, Qian Cai wrote:
>> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
>> index 1f7734949ac8..832e87966dcf 100644
>> --- a/kernel/locking/osq_lock.c
>> +++ b/kernel/locking/osq_lock.c
>> @@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
>>                 * wait for either @lock to point to us, through its Step-B, or
>>                 * wait for a new @node->next from its Step-C.
>>                 */
>> -               if (node->next) {
>> +               if (READ_ONCE(node->next)) {
>>                        next = xchg(&node->next, NULL);
>>                        if (next)
>>                                break;
> 
> This could possibly trigger the warning, but is a false positive. The
> above doesn't fix anything in that even if that load is shattered the
> code will function correctly -- it checks for any !0 value, any byte
> composite that is !0 is sufficient.
> 
> This is in fact something KCSAN compiler infrastructure could deduce.


Marco, any thought on improving KCSAN for this to reduce the false
positives?
Qian Cai Jan. 28, 2020, 3:12 a.m. UTC | #9
> On Jan 23, 2020, at 4:36 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Wed, Jan 22, 2020 at 11:38:51PM +0100, Marco Elver wrote:
> 
>> If possible, decode and get the line numbers. I have observed a data
>> race in osq_lock before, however, this is the only one I have recently
>> seen in osq_lock:
>> 
>> read to 0xffff88812c12d3d4 of 4 bytes by task 23304 on cpu 0:
>>  osq_lock+0x170/0x2f0 kernel/locking/osq_lock.c:143
>> 
>> 	while (!READ_ONCE(node->locked)) {
>> 		/*
>> 		 * If we need to reschedule bail... so we can block.
>> 		 * Use vcpu_is_preempted() to avoid waiting for a preempted
>> 		 * lock holder:
>> 		 */
>> -->		if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
>> 			goto unqueue;
>> 
>> 		cpu_relax();
>> 	}
>> 
>> where
>> 
>> 	static inline int node_cpu(struct optimistic_spin_node *node)
>> 	{
>> -->		return node->cpu - 1;
>> 	}
>> 
>> 
>> write to 0xffff88812c12d3d4 of 4 bytes by task 23334 on cpu 1:
>> osq_lock+0x89/0x2f0 kernel/locking/osq_lock.c:99
>> 
>> 	bool osq_lock(struct optimistic_spin_queue *lock)
>> 	{
>> 		struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
>> 		struct optimistic_spin_node *prev, *next;
>> 		int curr = encode_cpu(smp_processor_id());
>> 		int old;
>> 
>> 		node->locked = 0;
>> 		node->next = NULL;
>> -->		node->cpu = curr;
>> 
> 
> Yeah, that's impossible. This store happens before the node is
> published, so no matter how the load in node_cpu() is shattered, it must
> observe the right value.

Marco, any thought on how to do something about this? The worry is that
too many false positives like this will render the tool usefulness as a
general debug option.
Marco Elver Jan. 28, 2020, 8:18 a.m. UTC | #10
On Tue, 28 Jan 2020 at 04:13, Qian Cai <cai@lca.pw> wrote:
>
> > On Jan 23, 2020, at 4:36 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Jan 22, 2020 at 11:38:51PM +0100, Marco Elver wrote:
> >
> >> If possible, decode and get the line numbers. I have observed a data
> >> race in osq_lock before, however, this is the only one I have recently
> >> seen in osq_lock:
> >>
> >> read to 0xffff88812c12d3d4 of 4 bytes by task 23304 on cpu 0:
> >>  osq_lock+0x170/0x2f0 kernel/locking/osq_lock.c:143
> >>
> >>      while (!READ_ONCE(node->locked)) {
> >>              /*
> >>               * If we need to reschedule bail... so we can block.
> >>               * Use vcpu_is_preempted() to avoid waiting for a preempted
> >>               * lock holder:
> >>               */
> >> -->          if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
> >>                      goto unqueue;
> >>
> >>              cpu_relax();
> >>      }
> >>
> >> where
> >>
> >>      static inline int node_cpu(struct optimistic_spin_node *node)
> >>      {
> >> -->          return node->cpu - 1;
> >>      }
> >>
> >>
> >> write to 0xffff88812c12d3d4 of 4 bytes by task 23334 on cpu 1:
> >> osq_lock+0x89/0x2f0 kernel/locking/osq_lock.c:99
> >>
> >>      bool osq_lock(struct optimistic_spin_queue *lock)
> >>      {
> >>              struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> >>              struct optimistic_spin_node *prev, *next;
> >>              int curr = encode_cpu(smp_processor_id());
> >>              int old;
> >>
> >>              node->locked = 0;
> >>              node->next = NULL;
> >> -->          node->cpu = curr;
> >>
> >
> > Yeah, that's impossible. This store happens before the node is
> > published, so no matter how the load in node_cpu() is shattered, it must
> > observe the right value.
>
> Marco, any thought on how to do something about this? The worry is that
> too many false positives like this will render the tool usefulness as a
> general debug option.

This should be an instance of same-value-store, since the node->cpu is
per-CPU and smp_processor_id() should always be the same, at least
once it's published. I believe the data race I observed here before
KCSAN had KCSAN_REPORT_VALUE_CHANGE_ONLY on syzbot, and hasn't been
observed since. For the most part, that should deal with this case.

I will reply separately to your other email about the other data race.

Thanks,
-- Marco
Qian Cai Jan. 28, 2020, 10:10 a.m. UTC | #11
> On Jan 28, 2020, at 3:18 AM, Marco Elver <elver@google.com> wrote:
> 
> This should be an instance of same-value-store, since the node->cpu is
> per-CPU and smp_processor_id() should always be the same, at least
> once it's published. I believe the data race I observed here before
> KCSAN had KCSAN_REPORT_VALUE_CHANGE_ONLY on syzbot, and hasn't been
> observed since. For the most part, that should deal with this case.

Are you sure? I had KCSAN_REPORT_VALUE_CHANGE_ONLY=y here and saw something similar a splat. I’ll also double check on my side and provide the decoding.
Marco Elver Jan. 28, 2020, 10:29 a.m. UTC | #12
On Tue, 28 Jan 2020 at 11:10, Qian Cai <cai@lca.pw> wrote:
>
>
>
> > On Jan 28, 2020, at 3:18 AM, Marco Elver <elver@google.com> wrote:
> >
> > This should be an instance of same-value-store, since the node->cpu is
> > per-CPU and smp_processor_id() should always be the same, at least
> > once it's published. I believe the data race I observed here before
> > KCSAN had KCSAN_REPORT_VALUE_CHANGE_ONLY on syzbot, and hasn't been
> > observed since. For the most part, that should deal with this case.
>
> Are you sure? I had KCSAN_REPORT_VALUE_CHANGE_ONLY=y here and saw something similar a splat. I’ll also double check on my side and provide the decoding.

The data race you reported in this thread is a different one (same
function, but different accesses). I will reply separately.
Marco Elver Jan. 28, 2020, 11:46 a.m. UTC | #13
On Tue, 28 Jan 2020 at 04:11, Qian Cai <cai@lca.pw> wrote:
>
> > On Jan 23, 2020, at 4:39 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Jan 22, 2020 at 06:54:43PM -0500, Qian Cai wrote:
> >> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> >> index 1f7734949ac8..832e87966dcf 100644
> >> --- a/kernel/locking/osq_lock.c
> >> +++ b/kernel/locking/osq_lock.c
> >> @@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
> >>                 * wait for either @lock to point to us, through its Step-B, or
> >>                 * wait for a new @node->next from its Step-C.
> >>                 */
> >> -               if (node->next) {
> >> +               if (READ_ONCE(node->next)) {
> >>                        next = xchg(&node->next, NULL);
> >>                        if (next)
> >>                                break;
> >
> > This could possibly trigger the warning, but is a false positive. The
> > above doesn't fix anything in that even if that load is shattered the
> > code will function correctly -- it checks for any !0 value, any byte
> > composite that is !0 is sufficient.
> >
> > This is in fact something KCSAN compiler infrastructure could deduce.

Not in the general case. As far as I can tell, this if-statement is
purely optional and an optimization to avoid false sharing. This is
specific knowledge about the logic that (without conveying more
details about the logic) the tool couldn't safely deduce. Consider the
case:

T0:
if ( (x = READ_ONCE(ptr)) ) use_ptr_value(*x);

T1:
WRITE_ONCE(ptr, valid_ptr);

Here, unlike the case above, reading ptr without READ_ONCE can clearly
be dangerous.

The false sharing scenario came up before, and maybe it's worth
telling the tool about the logic. In fact, the 'data_race()' macro is
perfectly well suited to do this.

>
> Marco, any thought on improving KCSAN for this to reduce the false
> positives?

Define 'false positive'.

From what I can tell, all 'false positives' that have come up are data
races where the consequences on the behaviour of the code is
inconsequential. In other words, all of them would require
understanding of the intended logic of the code, and understanding if
the worst possible outcome of a data race changes the behaviour of the
code in such a way that we may end up with an erroneously behaving
system.

As I have said before, KCSAN (or any data race detector) by definition
only works at the language level. Any semantic analysis, beyond simple
rules (such as ignore same-value stores) and annotations, is simply
impossible since the tool can't know about the logic that the
programmer intended.

That being said, if there are simple rules (like ignore same-value
stores) or other minimal annotations that can help reduce such 'false
positives', more than happy to add them.

Qian: firstly I suggest you try
CONFIG_KCSAN_REPORT_ONCE_IN_MS=1000000000 as mentioned before so your
system doesn't get spammed, considering you do not use the default
config but want to use all debugging tools at once which seems to
trigger certain data races more than usual.

Secondly, what are your expectations? If you expect the situation to
be perfect tomorrow, you'll be disappointed. This is inherent, given
the problem we face (safe concurrency). Consider the various parts to
this story: concurrent kernel code, the LKMM, people's preferences and
opinions, and KCSAN (which is late to the party). All of them are
still evolving, hopefully together. At least that's my expectation.

What to do about osq_lock here? If people agree that no further
annotations are wanted, and the reasoning above concludes there are no
bugs, we can blacklist the file. That would, however, miss new data
races in future.

Thanks,
-- Marco
Qian Cai Jan. 28, 2020, 12:53 p.m. UTC | #14
> On Jan 28, 2020, at 6:46 AM, Marco Elver <elver@google.com> wrote:
> 
> Qian: firstly I suggest you try
> CONFIG_KCSAN_REPORT_ONCE_IN_MS=1000000000 as mentioned before so your
> system doesn't get spammed, considering you do not use the default
> config but want to use all debugging tools at once which seems to
> trigger certain data races more than usual.

Yes, I had that. There are still many reports that I plan to look at them one by one. It takes so much time that cause systemd storage lookup timeouts and I needed to manually get out of the emergency shell.

> 
> Secondly, what are your expectations? If you expect the situation to
> be perfect tomorrow, you'll be disappointed. This is inherent, given
> the problem we face (safe concurrency). Consider the various parts to
> this story: concurrent kernel code, the LKMM, people's preferences and
> opinions, and KCSAN (which is late to the party). All of them are
> still evolving, hopefully together. At least that's my expectation.

I’ll try to reduce splats as many as possible by any data_race(), disable the whole file or actually fix it. Any resolved splat will hurt the ability to find the real data races at some degrees.

> 
> What to do about osq_lock here? If people agree that no further
> annotations are wanted, and the reasoning above concludes there are no
> bugs, we can blacklist the file. That would, however, miss new data
> races in future.

This is a question to locking maintainers. data_race() macro sounds reasonable to me, but blacklisted the file is still better than leaving it as-is.
Peter Zijlstra Jan. 28, 2020, 4:52 p.m. UTC | #15
On Tue, Jan 28, 2020 at 12:46:26PM +0100, Marco Elver wrote:
> On Tue, 28 Jan 2020 at 04:11, Qian Cai <cai@lca.pw> wrote:
> >
> > > On Jan 23, 2020, at 4:39 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Wed, Jan 22, 2020 at 06:54:43PM -0500, Qian Cai wrote:
> > >> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > >> index 1f7734949ac8..832e87966dcf 100644
> > >> --- a/kernel/locking/osq_lock.c
> > >> +++ b/kernel/locking/osq_lock.c
> > >> @@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
> > >>                 * wait for either @lock to point to us, through its Step-B, or
> > >>                 * wait for a new @node->next from its Step-C.
> > >>                 */
> > >> -               if (node->next) {
> > >> +               if (READ_ONCE(node->next)) {
> > >>                        next = xchg(&node->next, NULL);
> > >>                        if (next)
> > >>                                break;
> > >
> > > This could possibly trigger the warning, but is a false positive. The
> > > above doesn't fix anything in that even if that load is shattered the
> > > code will function correctly -- it checks for any !0 value, any byte
> > > composite that is !0 is sufficient.
> > >
> > > This is in fact something KCSAN compiler infrastructure could deduce.
> 
> Not in the general case. As far as I can tell, this if-statement is
> purely optional and an optimization to avoid false sharing. This is
> specific knowledge about the logic that (without conveying more
> details about the logic) the tool couldn't safely deduce. Consider the
> case:
> 
> T0:
> if ( (x = READ_ONCE(ptr)) ) use_ptr_value(*x);
> 
> T1:
> WRITE_ONCE(ptr, valid_ptr);
> 
> Here, unlike the case above, reading ptr without READ_ONCE can clearly
> be dangerous.

There is a very big difference here though. In the osq case the result
of the load is only every compared to 0, after which the value is
discarded. While in your example you let the variable escape and use it
again later.

I'm claiming that in the first case, the only thing that's ever done
with a racy load is comparing against 0, there is no possible bad
outcome ever. While obviously if you let the load escape, or do anything
other than compare against 0, there is.
Peter Zijlstra Jan. 28, 2020, 4:56 p.m. UTC | #16
On Tue, Jan 28, 2020 at 12:46:26PM +0100, Marco Elver wrote:

> > Marco, any thought on improving KCSAN for this to reduce the false
> > positives?
> 
> Define 'false positive'.

I'll use it where the code as written is correct while the tool
complains about it.

> From what I can tell, all 'false positives' that have come up are data
> races where the consequences on the behaviour of the code is
> inconsequential. In other words, all of them would require
> understanding of the intended logic of the code, and understanding if
> the worst possible outcome of a data race changes the behaviour of the
> code in such a way that we may end up with an erroneously behaving
> system.
> 
> As I have said before, KCSAN (or any data race detector) by definition
> only works at the language level. Any semantic analysis, beyond simple
> rules (such as ignore same-value stores) and annotations, is simply
> impossible since the tool can't know about the logic that the
> programmer intended.
> 
> That being said, if there are simple rules (like ignore same-value
> stores) or other minimal annotations that can help reduce such 'false
> positives', more than happy to add them.

OK, so KCSAN knows about same-value-stores? If so, that ->cpu =
smp_processor_id() case really doesn't need annotation, right?

> What to do about osq_lock here? If people agree that no further
> annotations are wanted, and the reasoning above concludes there are no
> bugs, we can blacklist the file. That would, however, miss new data
> races in future.

I'm still hoping to convince you that the other case is one of those
'simple-rules' too :-)
Paul E. McKenney Jan. 29, 2020, 12:22 a.m. UTC | #17
On Tue, Jan 28, 2020 at 05:56:55PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 28, 2020 at 12:46:26PM +0100, Marco Elver wrote:
> 
> > > Marco, any thought on improving KCSAN for this to reduce the false
> > > positives?
> > 
> > Define 'false positive'.
> 
> I'll use it where the code as written is correct while the tool
> complains about it.

I could be wrong, but I would guess that Marco is looking for something
a little less subjective and a little more specific.  ;-)

> > From what I can tell, all 'false positives' that have come up are data
> > races where the consequences on the behaviour of the code is
> > inconsequential. In other words, all of them would require
> > understanding of the intended logic of the code, and understanding if
> > the worst possible outcome of a data race changes the behaviour of the
> > code in such a way that we may end up with an erroneously behaving
> > system.
> > 
> > As I have said before, KCSAN (or any data race detector) by definition
> > only works at the language level. Any semantic analysis, beyond simple
> > rules (such as ignore same-value stores) and annotations, is simply
> > impossible since the tool can't know about the logic that the
> > programmer intended.
> > 
> > That being said, if there are simple rules (like ignore same-value
> > stores) or other minimal annotations that can help reduce such 'false
> > positives', more than happy to add them.
> 
> OK, so KCSAN knows about same-value-stores? If so, that ->cpu =
> smp_processor_id() case really doesn't need annotation, right?

If smp_processor_id() returns the value already stored in ->cpu,
I believe that the default KCSAN setup refrains from complaining.

Which reminds me, I need to disable this in my RCU runs.  If I create a
bug that causes me to unknowingly access something that is supposed to
be CPU-private from the wrong CPU, I want to know about it.

> > What to do about osq_lock here? If people agree that no further
> > annotations are wanted, and the reasoning above concludes there are no
> > bugs, we can blacklist the file. That would, however, miss new data
> > races in future.
> 
> I'm still hoping to convince you that the other case is one of those
> 'simple-rules' too :-)

On this I must defer to Marco.

							Thanx, Paul
Marco Elver Jan. 29, 2020, 3:29 p.m. UTC | #18
On Wed, 29 Jan 2020 at 01:22, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Jan 28, 2020 at 05:56:55PM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 28, 2020 at 12:46:26PM +0100, Marco Elver wrote:
> >
> > > > Marco, any thought on improving KCSAN for this to reduce the false
> > > > positives?
> > >
> > > Define 'false positive'.
> >
> > I'll use it where the code as written is correct while the tool
> > complains about it.
>
> I could be wrong, but I would guess that Marco is looking for something
> a little less subjective and a little more specific.  ;-)
>
> > > From what I can tell, all 'false positives' that have come up are data
> > > races where the consequences on the behaviour of the code is
> > > inconsequential. In other words, all of them would require
> > > understanding of the intended logic of the code, and understanding if
> > > the worst possible outcome of a data race changes the behaviour of the
> > > code in such a way that we may end up with an erroneously behaving
> > > system.
> > >
> > > As I have said before, KCSAN (or any data race detector) by definition
> > > only works at the language level. Any semantic analysis, beyond simple
> > > rules (such as ignore same-value stores) and annotations, is simply
> > > impossible since the tool can't know about the logic that the
> > > programmer intended.
> > >
> > > That being said, if there are simple rules (like ignore same-value
> > > stores) or other minimal annotations that can help reduce such 'false
> > > positives', more than happy to add them.
> >
> > OK, so KCSAN knows about same-value-stores? If so, that ->cpu =
> > smp_processor_id() case really doesn't need annotation, right?
>
> If smp_processor_id() returns the value already stored in ->cpu,
> I believe that the default KCSAN setup refrains from complaining.

Yes it won't report this with KCSAN_REPORT_VALUE_CHANGE_ONLY.  (There
was one case I missed, and just sent a patch to fix.)

> Which reminds me, I need to disable this in my RCU runs.  If I create a
> bug that causes me to unknowingly access something that is supposed to
> be CPU-private from the wrong CPU, I want to know about it.
>
> > > What to do about osq_lock here? If people agree that no further
> > > annotations are wanted, and the reasoning above concludes there are no
> > > bugs, we can blacklist the file. That would, however, miss new data
> > > races in future.
> >
> > I'm still hoping to convince you that the other case is one of those
> > 'simple-rules' too :-)
>
> On this I must defer to Marco.

On Tue, 28 Jan 2020 at 17:52, Peter Zijlstra <peterz@infradead.org> wrote:
> I'm claiming that in the first case, the only thing that's ever done
> with a racy load is comparing against 0, there is no possible bad
> outcome ever. While obviously if you let the load escape, or do anything
> other than compare against 0, there is.

It might sound like a simple rule, but implementing this is anything
but simple: This would require changing the compiler, which we said
we'd like to avoid as it introduces new problems. This particular rule
relies on semantic analysis that is beyond what the TSAN
instrumentation currently supports. Right now we support GCC and
Clang; changing the compiler probably means we'd end up with only one
(probably Clang), and many more years before the change has propagated
to the majority of used compiler versions. It'd be good if we can do
this purely as a change in the kernel's codebase.

Keeping the bigger picture in mind, how frequent is this case, and
what are we really trying to accomplish? Is it only to avoid a
READ_ONCE? Why is the READ_ONCE bad here? If there is a racing access,
why not be explicit about it?

I can see that maybe this particular file probably doesn't need more
hints that there is concurrency here (given its osq_lock), but as a
general rule for the remainder of the kernel it appears to add more
inconsistencies.

Thanks,
-- Marco
Peter Zijlstra Jan. 29, 2020, 6:40 p.m. UTC | #19
On Wed, Jan 29, 2020 at 04:29:43PM +0100, Marco Elver wrote:

> On Tue, 28 Jan 2020 at 17:52, Peter Zijlstra <peterz@infradead.org> wrote:
> > I'm claiming that in the first case, the only thing that's ever done
> > with a racy load is comparing against 0, there is no possible bad
> > outcome ever. While obviously if you let the load escape, or do anything
> > other than compare against 0, there is.
> 
> It might sound like a simple rule, but implementing this is anything
> but simple: This would require changing the compiler,

Right.

> which we said we'd like to avoid as it introduces new problems.

Ah, I missed that brief.

> This particular rule relies on semantic analysis that is beyond what
> the TSAN instrumentation currently supports. Right now we support GCC
> and Clang; changing the compiler probably means we'd end up with only
> one (probably Clang), and many more years before the change has
> propagated to the majority of used compiler versions. It'd be good if
> we can do this purely as a change in the kernel's codebase.

*sigh*, I didn't know there was such a resistance to change the tooling.
That seems very unfortunate :-/

> Keeping the bigger picture in mind, how frequent is this case, and
> what are we really trying to accomplish?

It's trying to avoid the RmW pulling the line in exclusive/modified
state in a loop. The basic C-CAS pattern if you will.

> Is it only to avoid a READ_ONCE? Why is the READ_ONCE bad here? If
> there is a racing access, why not be explicit about it?

It's probably not terrible to put a READ_ONCE() there; we just need to
make sure the compiler doesn't do something stupid (it is known to do
stupid when 'volatile' is present).

But the fact remains that it is entirely superfluous, there is no
possible way the compiler can wreck this.
Peter Zijlstra Jan. 29, 2020, 6:49 p.m. UTC | #20
On Tue, Jan 28, 2020 at 04:22:53PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 28, 2020 at 05:56:55PM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 28, 2020 at 12:46:26PM +0100, Marco Elver wrote:
> > 
> > > > Marco, any thought on improving KCSAN for this to reduce the false
> > > > positives?
> > > 
> > > Define 'false positive'.
> > 
> > I'll use it where the code as written is correct while the tool
> > complains about it.
> 
> I could be wrong, but I would guess that Marco is looking for something
> a little less subjective and a little more specific.  ;-)

How is that either? If any valid translation by a compile results in
correct functionality, yet the tool complains, then surely we can speak
of a objective fact.
Paul E. McKenney Jan. 29, 2020, 7:26 p.m. UTC | #21
On Wed, Jan 29, 2020 at 07:49:35PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 28, 2020 at 04:22:53PM -0800, Paul E. McKenney wrote:
> > On Tue, Jan 28, 2020 at 05:56:55PM +0100, Peter Zijlstra wrote:
> > > On Tue, Jan 28, 2020 at 12:46:26PM +0100, Marco Elver wrote:
> > > 
> > > > > Marco, any thought on improving KCSAN for this to reduce the false
> > > > > positives?
> > > > 
> > > > Define 'false positive'.
> > > 
> > > I'll use it where the code as written is correct while the tool
> > > complains about it.
> > 
> > I could be wrong, but I would guess that Marco is looking for something
> > a little less subjective and a little more specific.  ;-)
> 
> How is that either? If any valid translation by a compile results in
> correct functionality, yet the tool complains, then surely we can speak
> of a objective fact.

Marco covered my concern in his point about the need to change the
compiler.

In any case, agreed, if a read does nothing but feed into the old/new
values for a CAS, the only thing a reasonable compiler (as opposed to
a just-barely-meets-the-standard demonic compiler) can do to you is to
decrease the CAS success rate.

							Thanx, Paul
Marco Elver Jan. 30, 2020, 1:39 p.m. UTC | #22
On Wed, 29 Jan 2020 at 19:40, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Jan 29, 2020 at 04:29:43PM +0100, Marco Elver wrote:
>
> > On Tue, 28 Jan 2020 at 17:52, Peter Zijlstra <peterz@infradead.org> wrote:
> > > I'm claiming that in the first case, the only thing that's ever done
> > > with a racy load is comparing against 0, there is no possible bad
> > > outcome ever. While obviously if you let the load escape, or do anything
> > > other than compare against 0, there is.
> >
> > It might sound like a simple rule, but implementing this is anything
> > but simple: This would require changing the compiler,
>
> Right.
>
> > which we said we'd like to avoid as it introduces new problems.
>
> Ah, I missed that brief.
>
> > This particular rule relies on semantic analysis that is beyond what
> > the TSAN instrumentation currently supports. Right now we support GCC
> > and Clang; changing the compiler probably means we'd end up with only
> > one (probably Clang), and many more years before the change has
> > propagated to the majority of used compiler versions. It'd be good if
> > we can do this purely as a change in the kernel's codebase.
>
> *sigh*, I didn't know there was such a resistance to change the tooling.
> That seems very unfortunate :-/

Unfortunately. Just wanted to highlight what to expect if we go down
that path. We can put it on a nice-to-have list, but don't expect or
rely on it to happen soon, given the implications above.

> > Keeping the bigger picture in mind, how frequent is this case, and
> > what are we really trying to accomplish?
>
> It's trying to avoid the RmW pulling the line in exclusive/modified
> state in a loop. The basic C-CAS pattern if you will.
>
> > Is it only to avoid a READ_ONCE? Why is the READ_ONCE bad here? If
> > there is a racing access, why not be explicit about it?
>
> It's probably not terrible to put a READ_ONCE() there; we just need to
> make sure the compiler doesn't do something stupid (it is known to do
> stupid when 'volatile' is present).

Maybe we need to optimize READ_ONCE().

'if (data_race(..))' would also work here and has no cost.

> But the fact remains that it is entirely superfluous, there is no
> possible way the compiler can wreck this.

Agree. Still thinking if there is a way to do it without changing the
compiler, but I can't see it right now. :/

Thanks,
-- Marco
Peter Zijlstra Jan. 30, 2020, 1:48 p.m. UTC | #23
On Thu, Jan 30, 2020 at 02:39:38PM +0100, Marco Elver wrote:
> On Wed, 29 Jan 2020 at 19:40, Peter Zijlstra <peterz@infradead.org> wrote:

> > It's probably not terrible to put a READ_ONCE() there; we just need to
> > make sure the compiler doesn't do something stupid (it is known to do
> > stupid when 'volatile' is present).
> 
> Maybe we need to optimize READ_ONCE().

I think recent compilers have gotten better at volatile. In part because
of our complaints.

> 'if (data_race(..))' would also work here and has no cost.

Right, that might be the best option.
Qian Cai Jan. 31, 2020, 3:32 a.m. UTC | #24
> On Jan 30, 2020, at 8:48 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Thu, Jan 30, 2020 at 02:39:38PM +0100, Marco Elver wrote:
>> On Wed, 29 Jan 2020 at 19:40, Peter Zijlstra <peterz@infradead.org> wrote:
> 
>>> It's probably not terrible to put a READ_ONCE() there; we just need to
>>> make sure the compiler doesn't do something stupid (it is known to do
>>> stupid when 'volatile' is present).
>> 
>> Maybe we need to optimize READ_ONCE().
> 
> I think recent compilers have gotten better at volatile. In part because
> of our complaints.
> 
>> 'if (data_race(..))' would also work here and has no cost.
> 
> Right, that might be the best option.
> 

OK, I’ll send a patch for that.

BTW, I have another one to report. Can’t see how the load tearing would
cause any real issue.

[  519.240629] BUG: KCSAN: data-race in osq_lock / osq_unlock

[  519.249088] write (marked) to 0xffff8bb2f133be40 of 8 bytes by task 421 on cpu 38:
[  519.257427]  osq_unlock+0xa8/0x170 kernel/locking/osq_lock.c:219
[  519.261571]  __mutex_lock+0x4b3/0xd20
[  519.265972]  mutex_lock_nested+0x31/0x40
[  519.270639]  memcg_create_kmem_cache+0x2e/0x190
[  519.275922]  memcg_kmem_cache_create_func+0x40/0x80
[  519.281553]  process_one_work+0x54c/0xbe0
[  519.286308]  worker_thread+0x80/0x650
[  519.290715]  kthread+0x1e0/0x200
[  519.294690]  ret_from_fork+0x27/0x50


void osq_unlock(struct optimistic_spin_queue *lock)
{
        struct optimistic_spin_node *node, *next;
        int curr = encode_cpu(smp_processor_id());

        /*
         * Fast path for the uncontended case.
         */
        if (likely(atomic_cmpxchg_release(&lock->tail, curr,
                                          OSQ_UNLOCKED_VAL) == curr))
                return;

        /*
         * Second most likely case.
         */
        node = this_cpu_ptr(&osq_node);
        next = xchg(&node->next, NULL);    <--------------------------
        if (next) {
                WRITE_ONCE(next->locked, 1);
                return;
        }

        next = osq_wait_next(lock, node, NULL);
        if (next)
                WRITE_ONCE(next->locked, 1);
}


[  519.301232] read to 0xffff8bb2f133be40 of 8 bytes by task 196 on cpu 12:
[  519.308705]  osq_lock+0x1e2/0x340 kernel/locking/osq_lock.c:157
[  519.312762]  __mutex_lock+0x277/0xd20
[  519.317167]  mutex_lock_nested+0x31/0x40
[  519.321838]  memcg_create_kmem_cache+0x2e/0x190
[  519.327120]  memcg_kmem_cache_create_func+0x40/0x80
[  519.332751]  process_one_work+0x54c/0xbe0
[  519.337508]  worker_thread+0x80/0x650
[  519.341922]  kthread+0x1e0/0x200
[  519.345889]  ret_from_fork+0x27/0x50


        for (;;) {
                if (prev->next == node &&         <------------------------
                    cmpxchg(&prev->next, node, NULL) == node)
                        break;

                /*
                 * We can only fail the cmpxchg() racing against an unlock(),
                 * in which case we should observe @node->locked becomming
                 * true.
                 */
                if (smp_load_acquire(&node->locked))
                        return true;

                cpu_relax();

                /*
                 * Or we race against a concurrent unqueue()'s step-B, in which
                 * case its step-C will write us a new @node->prev pointer.
                 */
                prev = READ_ONCE(node->prev);
        }


[  519.352420] Reported by Kernel Concurrency Sanitizer on:
[  519.358492] CPU: 12 PID: 196 Comm: kworker/12:1 Tainted: G        W    L    5.5.0-next-20200130+ #3
[  519.368317] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[  519.377627] Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func

Patch
diff mbox series

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 6ef600aa0f47..8f565165019a 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -77,7 +77,7 @@  osq_wait_next(struct optimistic_spin_queue *lock,
 		 */
 		if (node->next) {
 			next = xchg(&node->next, NULL);
-			if (next)
+			if (READ_ONCE(next))
 				break;
 		}