All of lore.kernel.org
 help / color / mirror / Atom feed
* Confused about hlist_unhashed_lockless()
@ 2020-01-31 16:43 Will Deacon
  2020-01-31 16:48 ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2020-01-31 16:43 UTC (permalink / raw)
  To: edumazet, tglx, paulmck; +Cc: x86, linux-kernel, peterz

Hi folks,

I just ran into c54a2744497d ("list: Add hlist_unhashed_lockless()")
but I'm a bit confused about what it's trying to achieve. It also seems
to have been merged without any callers (even in -next) -- was that
intentional?

My main source of confusion is the lack of memory barriers. For example,
if you look at the following pair of functions:


static inline int hlist_unhashed_lockless(const struct hlist_node *h)
{
	return !READ_ONCE(h->pprev);
}

static inline void hlist_add_before(struct hlist_node *n,
				    struct hlist_node *next)
{
	WRITE_ONCE(n->pprev, next->pprev);
	WRITE_ONCE(n->next, next);
	WRITE_ONCE(next->pprev, &n->next);
	WRITE_ONCE(*(n->pprev), n);
}


Then running these two concurrently on the same node means that
hlist_unhashed_lockless() doesn't really tell you anything about whether
or not the node is reachable in the list (i.e. there is another node
with a next pointer pointing to it). In other words, I think all of
these outcomes are permitted:

	hlist_unhashed_lockless(n)	n reachable in list
	0				0 (No reordering)
	0				1 (No reordering)
	1				0 (No reordering)
	1				1 (Reorder first and last WRITE_ONCEs)

So I must be missing some details about the use-case here. Please could
you enlighten me? The RCU implementation permits only the first three
outcomes afaict, why not use that and leave non-RCU hlist as it was?

Cheers,

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 16:43 Confused about hlist_unhashed_lockless() Will Deacon
@ 2020-01-31 16:48 ` Eric Dumazet
  2020-01-31 16:57   ` Will Deacon
                     ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Eric Dumazet @ 2020-01-31 16:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, the arch/x86 maintainers,
	LKML, Peter Zijlstra

On Fri, Jan 31, 2020 at 8:43 AM Will Deacon <will@kernel.org> wrote:
>
> Hi folks,
>
> I just ran into c54a2744497d ("list: Add hlist_unhashed_lockless()")
> but I'm a bit confused about what it's trying to achieve. It also seems
> to have been merged without any callers (even in -next) -- was that
> intentional?
>
> My main source of confusion is the lack of memory barriers. For example,
> if you look at the following pair of functions:
>
>
> static inline int hlist_unhashed_lockless(const struct hlist_node *h)
> {
>         return !READ_ONCE(h->pprev);
> }
>
> static inline void hlist_add_before(struct hlist_node *n,
>                                     struct hlist_node *next)
> {
>         WRITE_ONCE(n->pprev, next->pprev);
>         WRITE_ONCE(n->next, next);
>         WRITE_ONCE(next->pprev, &n->next);
>         WRITE_ONCE(*(n->pprev), n);
> }
>
>
> Then running these two concurrently on the same node means that
> hlist_unhashed_lockless() doesn't really tell you anything about whether
> or not the node is reachable in the list (i.e. there is another node
> with a next pointer pointing to it). In other words, I think all of
> these outcomes are permitted:
>
>         hlist_unhashed_lockless(n)      n reachable in list
>         0                               0 (No reordering)
>         0                               1 (No reordering)
>         1                               0 (No reordering)
>         1                               1 (Reorder first and last WRITE_ONCEs)
>
> So I must be missing some details about the use-case here. Please could
> you enlighten me? The RCU implementation permits only the first three
> outcomes afaict, why not use that and leave non-RCU hlist as it was?
>

I guess the following has been lost :

Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Nov 7 11:23:14 2019 -0800

    timer: use hlist_unhashed_lockless() in timer_pending()

    timer_pending() is mostly used in lockless contexts.

    Without proper annotations, KCSAN might detect a data-race [1]

    Using hlist_unhashed_lockless() instead of hand-coding it
    seems appropriate (as suggested by Paul E. McKenney).

    [1]

    BUG: KCSAN: data-race in del_timer / detach_if_pending

    write to 0xffff88808697d870 of 8 bytes by task 10 on cpu 0:
     __hlist_del include/linux/list.h:764 [inline]
     detach_timer kernel/time/timer.c:815 [inline]
     detach_if_pending+0xcd/0x2d0 kernel/time/timer.c:832
     try_to_del_timer_sync+0x60/0xb0 kernel/time/timer.c:1226
     del_timer_sync+0x6b/0xa0 kernel/time/timer.c:1365
     schedule_timeout+0x2d2/0x6e0 kernel/time/timer.c:1896
     rcu_gp_fqs_loop+0x37c/0x580 kernel/rcu/tree.c:1639
     rcu_gp_kthread+0x143/0x230 kernel/rcu/tree.c:1799
     kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
     ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

    read to 0xffff88808697d870 of 8 bytes by task 12060 on cpu 1:
     del_timer+0x3b/0xb0 kernel/time/timer.c:1198
     sk_stop_timer+0x25/0x60 net/core/sock.c:2845
     inet_csk_clear_xmit_timers+0x69/0xa0 net/ipv4/inet_connection_sock.c:523
     tcp_clear_xmit_timers include/net/tcp.h:606 [inline]
     tcp_v4_destroy_sock+0xa3/0x3f0 net/ipv4/tcp_ipv4.c:2096
     inet_csk_destroy_sock+0xf4/0x250 net/ipv4/inet_connection_sock.c:836
     tcp_close+0x6f3/0x970 net/ipv4/tcp.c:2497
     inet_release+0x86/0x100 net/ipv4/af_inet.c:427
     __sock_release+0x85/0x160 net/socket.c:590
     sock_close+0x24/0x30 net/socket.c:1268
     __fput+0x1e1/0x520 fs/file_table.c:280
     ____fput+0x1f/0x30 fs/file_table.c:313
     task_work_run+0xf6/0x130 kernel/task_work.c:113
     tracehook_notify_resume include/linux/tracehook.h:188 [inline]
     exit_to_usermode_loop+0x2b4/0x2c0 arch/x86/entry/common.c:163

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 12060 Comm: syz-executor.5 Not tainted 5.4.0-rc3+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine,

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: "Paul E. McKenney" <paulmck@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 1e6650ed066d5d28251b0bd385fc37ef94c96532..0dc19a8c39c9e49a7cde3d34bfa4be8871cbc1c2
100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -164,7 +164,7 @@ static inline void destroy_timer_on_stack(struct
timer_list *timer) { }
  */
 static inline int timer_pending(const struct timer_list * timer)
 {
- return timer->entry.pprev != NULL;
+ return !hlist_unhashed_lockless(&timer->entry);
 }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 16:48 ` Eric Dumazet
@ 2020-01-31 16:57   ` Will Deacon
  2020-01-31 17:06     ` Eric Dumazet
  2020-01-31 18:43   ` Peter Zijlstra
  2020-01-31 18:52   ` Paul E. McKenney
  2 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2020-01-31 16:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Paul E. McKenney, the arch/x86 maintainers,
	LKML, Peter Zijlstra

On Fri, Jan 31, 2020 at 08:48:05AM -0800, Eric Dumazet wrote:
> On Fri, Jan 31, 2020 at 8:43 AM Will Deacon <will@kernel.org> wrote:
> > I just ran into c54a2744497d ("list: Add hlist_unhashed_lockless()")
> > but I'm a bit confused about what it's trying to achieve. It also seems
> > to have been merged without any callers (even in -next) -- was that
> > intentional?
> >
> > My main source of confusion is the lack of memory barriers. For example,
> > if you look at the following pair of functions:
> >
> >
> > static inline int hlist_unhashed_lockless(const struct hlist_node *h)
> > {
> >         return !READ_ONCE(h->pprev);
> > }
> >
> > static inline void hlist_add_before(struct hlist_node *n,
> >                                     struct hlist_node *next)
> > {
> >         WRITE_ONCE(n->pprev, next->pprev);
> >         WRITE_ONCE(n->next, next);
> >         WRITE_ONCE(next->pprev, &n->next);
> >         WRITE_ONCE(*(n->pprev), n);
> > }
> >
> >
> > Then running these two concurrently on the same node means that
> > hlist_unhashed_lockless() doesn't really tell you anything about whether
> > or not the node is reachable in the list (i.e. there is another node
> > with a next pointer pointing to it). In other words, I think all of
> > these outcomes are permitted:
> >
> >         hlist_unhashed_lockless(n)      n reachable in list
> >         0                               0 (No reordering)
> >         0                               1 (No reordering)
> >         1                               0 (No reordering)
> >         1                               1 (Reorder first and last WRITE_ONCEs)
> >
> > So I must be missing some details about the use-case here. Please could
> > you enlighten me? The RCU implementation permits only the first three
> > outcomes afaict, why not use that and leave non-RCU hlist as it was?
> >
> 
> I guess the following has been lost :

Thanks, although...

> Author: Eric Dumazet <edumazet@google.com>
> Date:   Thu Nov 7 11:23:14 2019 -0800
> 
>     timer: use hlist_unhashed_lockless() in timer_pending()
> 
>     timer_pending() is mostly used in lockless contexts.

... my point above still stands: the value returned by
hlist_unhashed_lockless() doesn't tell you anything about whether or
not the timer is reachable in the hlist or not. The comment above
timer_pending() also states that:

  | Callers must ensure serialization wrt. other operations done to
  | this timer, e.g. interrupt contexts, or other CPUs on SMP.

If that is intended to preclude list operations, shouldn't we use an
RCU hlist instead of throwing {READ,WRITE}_ONCE() at the problem to
shut the sanitiser up without actually fixing anything? :(

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 16:57   ` Will Deacon
@ 2020-01-31 17:06     ` Eric Dumazet
  2020-01-31 17:20       ` Will Deacon
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2020-01-31 17:06 UTC (permalink / raw)
  To: Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, the arch/x86 maintainers,
	LKML, Peter Zijlstra

On Fri, Jan 31, 2020 at 8:57 AM Will Deacon <will@kernel.org> wrote:
>
> On Fri, Jan 31, 2020 at 08:48:05AM -0800, Eric Dumazet wrote:
> > On Fri, Jan 31, 2020 at 8:43 AM Will Deacon <will@kernel.org> wrote:
> > > I just ran into c54a2744497d ("list: Add hlist_unhashed_lockless()")
> > > but I'm a bit confused about what it's trying to achieve. It also seems
> > > to have been merged without any callers (even in -next) -- was that
> > > intentional?
> > >
> > > My main source of confusion is the lack of memory barriers. For example,
> > > if you look at the following pair of functions:
> > >
> > >
> > > static inline int hlist_unhashed_lockless(const struct hlist_node *h)
> > > {
> > >         return !READ_ONCE(h->pprev);
> > > }
> > >
> > > static inline void hlist_add_before(struct hlist_node *n,
> > >                                     struct hlist_node *next)
> > > {
> > >         WRITE_ONCE(n->pprev, next->pprev);
> > >         WRITE_ONCE(n->next, next);
> > >         WRITE_ONCE(next->pprev, &n->next);
> > >         WRITE_ONCE(*(n->pprev), n);
> > > }
> > >
> > >
> > > Then running these two concurrently on the same node means that
> > > hlist_unhashed_lockless() doesn't really tell you anything about whether
> > > or not the node is reachable in the list (i.e. there is another node
> > > with a next pointer pointing to it). In other words, I think all of
> > > these outcomes are permitted:
> > >
> > >         hlist_unhashed_lockless(n)      n reachable in list
> > >         0                               0 (No reordering)
> > >         0                               1 (No reordering)
> > >         1                               0 (No reordering)
> > >         1                               1 (Reorder first and last WRITE_ONCEs)
> > >
> > > So I must be missing some details about the use-case here. Please could
> > > you enlighten me? The RCU implementation permits only the first three
> > > outcomes afaict, why not use that and leave non-RCU hlist as it was?
> > >
> >
> > I guess the following has been lost :
>
> Thanks, although...
>
> > Author: Eric Dumazet <edumazet@google.com>
> > Date:   Thu Nov 7 11:23:14 2019 -0800
> >
> >     timer: use hlist_unhashed_lockless() in timer_pending()
> >
> >     timer_pending() is mostly used in lockless contexts.
>
> ... my point above still stands: the value returned by
> hlist_unhashed_lockless() doesn't tell you anything about whether or
> not the timer is reachable in the hlist or not. The comment above
> timer_pending() also states that:
>
>   | Callers must ensure serialization wrt. other operations done to
>   | this timer, e.g. interrupt contexts, or other CPUs on SMP.
>
> If that is intended to preclude list operations, shouldn't we use an
> RCU hlist instead of throwing {READ,WRITE}_ONCE() at the problem to
> shut the sanitiser up without actually fixing anything? :(


Sorry, but timer_pending() requires no serialization.

The only thing we need is a READ_ONCE() so that compiler is not allowed
to optimize out stuff like

loop() {
   if (timer_pending())
      something;
}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 17:06     ` Eric Dumazet
@ 2020-01-31 17:20       ` Will Deacon
  2020-01-31 17:33         ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2020-01-31 17:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Paul E. McKenney, the arch/x86 maintainers,
	LKML, Peter Zijlstra

On Fri, Jan 31, 2020 at 09:06:27AM -0800, Eric Dumazet wrote:
> On Fri, Jan 31, 2020 at 8:57 AM Will Deacon <will@kernel.org> wrote:
> > On Fri, Jan 31, 2020 at 08:48:05AM -0800, Eric Dumazet wrote:
> > > On Fri, Jan 31, 2020 at 8:43 AM Will Deacon <will@kernel.org> wrote:
> > > > Then running these two concurrently on the same node means that
> > > > hlist_unhashed_lockless() doesn't really tell you anything about whether
> > > > or not the node is reachable in the list (i.e. there is another node
> > > > with a next pointer pointing to it). In other words, I think all of
> > > > these outcomes are permitted:
> > > >
> > > >         hlist_unhashed_lockless(n)      n reachable in list
> > > >         0                               0 (No reordering)
> > > >         0                               1 (No reordering)
> > > >         1                               0 (No reordering)
> > > >         1                               1 (Reorder first and last WRITE_ONCEs)
> > > >
> > > > So I must be missing some details about the use-case here. Please could
> > > > you enlighten me? The RCU implementation permits only the first three
> > > > outcomes afaict, why not use that and leave non-RCU hlist as it was?
> > > >
> > >
> > > I guess the following has been lost :
> >
> > Thanks, although...
> >
> > > Author: Eric Dumazet <edumazet@google.com>
> > > Date:   Thu Nov 7 11:23:14 2019 -0800
> > >
> > >     timer: use hlist_unhashed_lockless() in timer_pending()
> > >
> > >     timer_pending() is mostly used in lockless contexts.
> >
> > ... my point above still stands: the value returned by
> > hlist_unhashed_lockless() doesn't tell you anything about whether or
> > not the timer is reachable in the hlist or not. The comment above
> > timer_pending() also states that:
> >
> >   | Callers must ensure serialization wrt. other operations done to
> >   | this timer, e.g. interrupt contexts, or other CPUs on SMP.
> >
> > If that is intended to preclude list operations, shouldn't we use an
> > RCU hlist instead of throwing {READ,WRITE}_ONCE() at the problem to
> > shut the sanitiser up without actually fixing anything? :(
> 
> 
> Sorry, but timer_pending() requires no serialization.

Then we should update the comment!

Without serialisation, timer_pending() as currently implemented does
not reliably tell you whether the timer is in the hlist. Is that not a
problem? Using an RCU hlist does not introduce serialisation, but does
at least rule out the case where timer_pending() returns false for a
timer that /is/ reachable in the list by another CPU.

> The only thing we need is a READ_ONCE() so that compiler is not allowed
> to optimize out stuff like
> 
> loop() {
>    if (timer_pending())
>       something;

If that was the case, then you wouldn't need to touch hlist_add_before()
at all so there's got to be more to it than that or we can revert that
part of the patch.

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 17:20       ` Will Deacon
@ 2020-01-31 17:33         ` Eric Dumazet
  2020-01-31 18:32           ` Will Deacon
  2020-01-31 19:47           ` Thomas Gleixner
  0 siblings, 2 replies; 27+ messages in thread
From: Eric Dumazet @ 2020-01-31 17:33 UTC (permalink / raw)
  To: Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, the arch/x86 maintainers,
	LKML, Peter Zijlstra

On Fri, Jan 31, 2020 at 9:21 AM Will Deacon <will@kernel.org> wrote:
>
> On Fri, Jan 31, 2020 at 09:06:27AM -0800, Eric Dumazet wrote:
> > On Fri, Jan 31, 2020 at 8:57 AM Will Deacon <will@kernel.org> wrote:
> > > On Fri, Jan 31, 2020 at 08:48:05AM -0800, Eric Dumazet wrote:
> > > > On Fri, Jan 31, 2020 at 8:43 AM Will Deacon <will@kernel.org> wrote:
> > > > > Then running these two concurrently on the same node means that
> > > > > hlist_unhashed_lockless() doesn't really tell you anything about whether
> > > > > or not the node is reachable in the list (i.e. there is another node
> > > > > with a next pointer pointing to it). In other words, I think all of
> > > > > these outcomes are permitted:
> > > > >
> > > > >         hlist_unhashed_lockless(n)      n reachable in list
> > > > >         0                               0 (No reordering)
> > > > >         0                               1 (No reordering)
> > > > >         1                               0 (No reordering)
> > > > >         1                               1 (Reorder first and last WRITE_ONCEs)
> > > > >
> > > > > So I must be missing some details about the use-case here. Please could
> > > > > you enlighten me? The RCU implementation permits only the first three
> > > > > outcomes afaict, why not use that and leave non-RCU hlist as it was?
> > > > >
> > > >
> > > > I guess the following has been lost :
> > >
> > > Thanks, although...
> > >
> > > > Author: Eric Dumazet <edumazet@google.com>
> > > > Date:   Thu Nov 7 11:23:14 2019 -0800
> > > >
> > > >     timer: use hlist_unhashed_lockless() in timer_pending()
> > > >
> > > >     timer_pending() is mostly used in lockless contexts.
> > >
> > > ... my point above still stands: the value returned by
> > > hlist_unhashed_lockless() doesn't tell you anything about whether or
> > > not the timer is reachable in the hlist or not. The comment above
> > > timer_pending() also states that:
> > >
> > >   | Callers must ensure serialization wrt. other operations done to
> > >   | this timer, e.g. interrupt contexts, or other CPUs on SMP.
> > >
> > > If that is intended to preclude list operations, shouldn't we use an
> > > RCU hlist instead of throwing {READ,WRITE}_ONCE() at the problem to
> > > shut the sanitiser up without actually fixing anything? :(
> >
> >
> > Sorry, but timer_pending() requires no serialization.
>
> Then we should update the comment!

Which one ?

It seems KCSAN does not read the comments :)

>
> Without serialisation, timer_pending() as currently implemented does
> not reliably tell you whether the timer is in the hlist. Is that not a
> problem?

No it is not a problem.

However some callers might have incorrect assumptions, I have not
audited all the code.

 Using an RCU hlist does not introduce serialisation, but does
> at least rule out the case where timer_pending() returns false for a
> timer that /is/ reachable in the list by another CPU.
>
> > The only thing we need is a READ_ONCE() so that compiler is not allowed
> > to optimize out stuff like
> >
> > loop() {
> >    if (timer_pending())
> >       something;
>
> If that was the case, then you wouldn't need to touch hlist_add_before()
> at all so there's got to be more to it than that or we can revert that
> part of the patch.


Sorry, I do not get your point. It would help if you provide a patch
or something.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 17:33         ` Eric Dumazet
@ 2020-01-31 18:32           ` Will Deacon
  2020-01-31 19:47           ` Thomas Gleixner
  1 sibling, 0 replies; 27+ messages in thread
From: Will Deacon @ 2020-01-31 18:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Paul E. McKenney, the arch/x86 maintainers,
	LKML, Peter Zijlstra

On Fri, Jan 31, 2020 at 09:33:55AM -0800, Eric Dumazet wrote:
> On Fri, Jan 31, 2020 at 9:21 AM Will Deacon <will@kernel.org> wrote:
> > On Fri, Jan 31, 2020 at 09:06:27AM -0800, Eric Dumazet wrote:
> > > Sorry, but timer_pending() requires no serialization.
> >
> > Then we should update the comment!
> 
> Which one ?
> 
> It seems KCSAN does not read the comments :)

Haha, that would be nice wouldn't it?

> > Without serialisation, timer_pending() as currently implemented does
> > not reliably tell you whether the timer is in the hlist. Is that not a
> > problem?
> 
> No it is not a problem.
> 
> However some callers might have incorrect assumptions, I have not
> audited all the code.

Of course not, I'm just interested in what correct callers are allowed
to assume.

>  Using an RCU hlist does not introduce serialisation, but does
> > at least rule out the case where timer_pending() returns false for a
> > timer that /is/ reachable in the list by another CPU.
> >
> > > The only thing we need is a READ_ONCE() so that compiler is not allowed
> > > to optimize out stuff like
> > >
> > > loop() {
> > >    if (timer_pending())
> > >       something;
> >
> > If that was the case, then you wouldn't need to touch hlist_add_before()
> > at all so there's got to be more to it than that or we can revert that
> > part of the patch.
> 
> Sorry, I do not get your point. It would help if you provide a patch
> or something.

Ok, diff below. If I've understood you correctly (big if ;), then I think
it's sufficient. The idea is that you have to explicitly annotate your code
as a data race if you're trying to use the non-RCU list implementation in a
lock-free fashion.

Will

--->8

diff --git a/include/linux/list.h b/include/linux/list.h
index 884216db3246..74c78017332e 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -810,7 +810,7 @@ static inline void __hlist_del(struct hlist_node *n)
 
 	WRITE_ONCE(*pprev, next);
 	if (next)
-		WRITE_ONCE(next->pprev, pprev);
+		next->pprev = pprev;
 }
 
 /**
@@ -852,11 +852,11 @@ static inline void hlist_del_init(struct hlist_node *n)
 static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h)
 {
 	struct hlist_node *first = h->first;
-	WRITE_ONCE(n->next, first);
+	n->next = first;
 	if (first)
-		WRITE_ONCE(first->pprev, &n->next);
+		first->pprev = &n->next;
 	WRITE_ONCE(h->first, n);
-	WRITE_ONCE(n->pprev, &h->first);
+	n->pprev = &h->first;
 }
 
 /**
@@ -867,9 +867,9 @@ static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h)
 static inline void hlist_add_before(struct hlist_node *n,
 				    struct hlist_node *next)
 {
-	WRITE_ONCE(n->pprev, next->pprev);
-	WRITE_ONCE(n->next, next);
-	WRITE_ONCE(next->pprev, &n->next);
+	n->pprev = next->pprev;
+	n->next = next;
+	next->pprev = &n->next;
 	WRITE_ONCE(*(n->pprev), n);
 }
 
@@ -881,12 +881,12 @@ static inline void hlist_add_before(struct hlist_node *n,
 static inline void hlist_add_behind(struct hlist_node *n,
 				    struct hlist_node *prev)
 {
-	WRITE_ONCE(n->next, prev->next);
-	WRITE_ONCE(prev->next, n);
-	WRITE_ONCE(n->pprev, &prev->next);
+	n->next = prev->next;
+	prev->next = n;
+	n->pprev = &prev->next;
 
 	if (n->next)
-		WRITE_ONCE(n->next->pprev, &n->next);
+		n->next->pprev  = &n->next;
 }
 
 /**
diff --git a/include/linux/timer.h b/include/linux/timer.h
index 1e6650ed066d..53c3b3bc00f0 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -158,13 +158,14 @@ static inline void destroy_timer_on_stack(struct timer_list *timer) { }
  *
  * timer_pending will tell whether a given timer is currently pending,
  * or not. Callers must ensure serialization wrt. other operations done
- * to this timer, eg. interrupt contexts, or other CPUs on SMP.
+ * to this timer, eg. interrupt contexts, or other CPUs on SMP if they
+ * cannot tolerate spurious results.
  *
  * return value: 1 if the timer is pending, 0 if not.
  */
 static inline int timer_pending(const struct timer_list * timer)
 {
-	return timer->entry.pprev != NULL;
+	return !data_race(hlist_unhashed_lockless(&timer->entry));
 }
 
 extern void add_timer_on(struct timer_list *timer, int cpu);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 16:48 ` Eric Dumazet
  2020-01-31 16:57   ` Will Deacon
@ 2020-01-31 18:43   ` Peter Zijlstra
  2020-01-31 18:48     ` Eric Dumazet
  2020-01-31 18:52   ` Paul E. McKenney
  2 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2020-01-31 18:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Will Deacon, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, LKML, Marco Elver

On Fri, Jan 31, 2020 at 08:48:05AM -0800, Eric Dumazet wrote:
>     BUG: KCSAN: data-race in del_timer / detach_if_pending

> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 1e6650ed066d5d28251b0bd385fc37ef94c96532..0dc19a8c39c9e49a7cde3d34bfa4be8871cbc1c2
> 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -164,7 +164,7 @@ static inline void destroy_timer_on_stack(struct
> timer_list *timer) { }
>   */
>  static inline int timer_pending(const struct timer_list * timer)
>  {
> - return timer->entry.pprev != NULL;
> + return !hlist_unhashed_lockless(&timer->entry);
>  }

That's just completely wrong.

Aside from any memory barrier issues that might or might not be there
(I'm waaaay to tired atm to tell), the above code is perfectly fine.

In fact, this is a KCSAN compiler infrastructure 'bug'.

Any load that is only compared to zero is immune to load-tearing issues.

The correct thing to do here is something like:

static inline int timer_pending(const struct timer_list *timer)
{
	/*
	 * KCSAN compiler infrastructure is insuffiently clever to
	 * realize that a 'load compared to zero' is immune to
	 * load-tearing.
	 */
	return data_race(timer->entry.pprev != NULL);
}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 18:43   ` Peter Zijlstra
@ 2020-01-31 18:48     ` Eric Dumazet
  2020-01-31 18:52       ` Eric Dumazet
  2020-01-31 19:20       ` Marco Elver
  0 siblings, 2 replies; 27+ messages in thread
From: Eric Dumazet @ 2020-01-31 18:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, LKML, Marco Elver

On Fri, Jan 31, 2020 at 10:43 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jan 31, 2020 at 08:48:05AM -0800, Eric Dumazet wrote:
> >     BUG: KCSAN: data-race in del_timer / detach_if_pending
>
> > diff --git a/include/linux/timer.h b/include/linux/timer.h
> > index 1e6650ed066d5d28251b0bd385fc37ef94c96532..0dc19a8c39c9e49a7cde3d34bfa4be8871cbc1c2
> > 100644
> > --- a/include/linux/timer.h
> > +++ b/include/linux/timer.h
> > @@ -164,7 +164,7 @@ static inline void destroy_timer_on_stack(struct
> > timer_list *timer) { }
> >   */
> >  static inline int timer_pending(const struct timer_list * timer)
> >  {
> > - return timer->entry.pprev != NULL;
> > + return !hlist_unhashed_lockless(&timer->entry);
> >  }
>
> That's just completely wrong.
>
> Aside from any memory barrier issues that might or might not be there
> (I'm waaaay to tired atm to tell), the above code is perfectly fine.
>
> In fact, this is a KCSAN compiler infrastructure 'bug'.
>
> Any load that is only compared to zero is immune to load-tearing issues.
>
> The correct thing to do here is something like:
>
> static inline int timer_pending(const struct timer_list *timer)
> {
>         /*
>          * KCSAN compiler infrastructure is insuffiently clever to
>          * realize that a 'load compared to zero' is immune to
>          * load-tearing.
>          */
>         return data_race(timer->entry.pprev != NULL);
> }


This is nice, now with have data_race()

Remember these patches were sent 2 months ago, at a time we were
trying to sort out things.

data_race() was merged a few days ago.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 18:48     ` Eric Dumazet
@ 2020-01-31 18:52       ` Eric Dumazet
  2020-01-31 20:53         ` Paul E. McKenney
  2020-02-03 15:45         ` David Laight
  2020-01-31 19:20       ` Marco Elver
  1 sibling, 2 replies; 27+ messages in thread
From: Eric Dumazet @ 2020-01-31 18:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, LKML, Marco Elver

On Fri, Jan 31, 2020 at 10:48 AM Eric Dumazet <edumazet@google.com> wrote:
>

> This is nice, now with have data_race()
>
> Remember these patches were sent 2 months ago, at a time we were
> trying to sort out things.
>
> data_race() was merged a few days ago.

Well, actually data_race() is not there yet anyway.

Is it really scheduled for 5.7 kernel ?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 16:48 ` Eric Dumazet
  2020-01-31 16:57   ` Will Deacon
  2020-01-31 18:43   ` Peter Zijlstra
@ 2020-01-31 18:52   ` Paul E. McKenney
  2020-01-31 18:55     ` Eric Dumazet
  2 siblings, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2020-01-31 18:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Will Deacon, Thomas Gleixner, the arch/x86 maintainers, LKML,
	Peter Zijlstra

On Fri, Jan 31, 2020 at 08:48:05AM -0800, Eric Dumazet wrote:
> On Fri, Jan 31, 2020 at 8:43 AM Will Deacon <will@kernel.org> wrote:
> >
> > Hi folks,
> >
> > I just ran into c54a2744497d ("list: Add hlist_unhashed_lockless()")
> > but I'm a bit confused about what it's trying to achieve. It also seems
> > to have been merged without any callers (even in -next) -- was that
> > intentional?
> >
> > My main source of confusion is the lack of memory barriers. For example,
> > if you look at the following pair of functions:
> >
> >
> > static inline int hlist_unhashed_lockless(const struct hlist_node *h)
> > {
> >         return !READ_ONCE(h->pprev);
> > }
> >
> > static inline void hlist_add_before(struct hlist_node *n,
> >                                     struct hlist_node *next)
> > {
> >         WRITE_ONCE(n->pprev, next->pprev);
> >         WRITE_ONCE(n->next, next);
> >         WRITE_ONCE(next->pprev, &n->next);
> >         WRITE_ONCE(*(n->pprev), n);
> > }
> >
> >
> > Then running these two concurrently on the same node means that
> > hlist_unhashed_lockless() doesn't really tell you anything about whether
> > or not the node is reachable in the list (i.e. there is another node
> > with a next pointer pointing to it). In other words, I think all of
> > these outcomes are permitted:
> >
> >         hlist_unhashed_lockless(n)      n reachable in list
> >         0                               0 (No reordering)
> >         0                               1 (No reordering)
> >         1                               0 (No reordering)
> >         1                               1 (Reorder first and last WRITE_ONCEs)
> >
> > So I must be missing some details about the use-case here. Please could
> > you enlighten me? The RCU implementation permits only the first three
> > outcomes afaict, why not use that and leave non-RCU hlist as it was?
> 
> I guess the following has been lost :

4d3d2ae81afd ("timer: Use hlist_unhashed_lockless() in timer_pending()")
in -rcu, slated for not this merge window but the next one.  And
including the changes in your later email, Eric.  Please see below
and let me know whether you are OK with it.

							Thanx, Paul

------------------------------------------------------------------------

commit 4d3d2ae81afd9d13394f255e3e1b974f5f9bd2e3
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Nov 7 11:37:38 2019 -0800

    timer: Use hlist_unhashed_lockless() in timer_pending()
    
    The timer_pending() function is mostly used in lockless contexts, so
    Without proper annotations, KCSAN might detect a data-race [1].
    
    Using hlist_unhashed_lockless() instead of hand-coding it seems
    appropriate (as suggested by Paul E. McKenney).
    
    [1]
    
    BUG: KCSAN: data-race in del_timer / detach_if_pending
    
    write to 0xffff88808697d870 of 8 bytes by task 10 on cpu 0:
     __hlist_del include/linux/list.h:764 [inline]
     detach_timer kernel/time/timer.c:815 [inline]
     detach_if_pending+0xcd/0x2d0 kernel/time/timer.c:832
     try_to_del_timer_sync+0x60/0xb0 kernel/time/timer.c:1226
     del_timer_sync+0x6b/0xa0 kernel/time/timer.c:1365
     schedule_timeout+0x2d2/0x6e0 kernel/time/timer.c:1896
     rcu_gp_fqs_loop+0x37c/0x580 kernel/rcu/tree.c:1639
     rcu_gp_kthread+0x143/0x230 kernel/rcu/tree.c:1799
     kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
     ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
    
    read to 0xffff88808697d870 of 8 bytes by task 12060 on cpu 1:
     del_timer+0x3b/0xb0 kernel/time/timer.c:1198
     sk_stop_timer+0x25/0x60 net/core/sock.c:2845
     inet_csk_clear_xmit_timers+0x69/0xa0 net/ipv4/inet_connection_sock.c:523
     tcp_clear_xmit_timers include/net/tcp.h:606 [inline]
     tcp_v4_destroy_sock+0xa3/0x3f0 net/ipv4/tcp_ipv4.c:2096
     inet_csk_destroy_sock+0xf4/0x250 net/ipv4/inet_connection_sock.c:836
     tcp_close+0x6f3/0x970 net/ipv4/tcp.c:2497
     inet_release+0x86/0x100 net/ipv4/af_inet.c:427
     __sock_release+0x85/0x160 net/socket.c:590
     sock_close+0x24/0x30 net/socket.c:1268
     __fput+0x1e1/0x520 fs/file_table.c:280
     ____fput+0x1f/0x30 fs/file_table.c:313
     task_work_run+0xf6/0x130 kernel/task_work.c:113
     tracehook_notify_resume include/linux/tracehook.h:188 [inline]
     exit_to_usermode_loop+0x2b4/0x2c0 arch/x86/entry/common.c:163
    
    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 12060 Comm: syz-executor.5 Not tainted 5.4.0-rc3+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine,
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    [ paulmck: Pulled in Eric's later amendments. ]
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 1e6650e..0dc19a8 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -164,7 +164,7 @@ static inline void destroy_timer_on_stack(struct timer_list *timer) { }
  */
 static inline int timer_pending(const struct timer_list * timer)
 {
-	return timer->entry.pprev != NULL;
+	return !hlist_unhashed_lockless(&timer->entry);
 }
 
 extern void add_timer_on(struct timer_list *timer, int cpu);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 4820823..568564a 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -944,6 +944,7 @@ static struct timer_base *lock_timer_base(struct timer_list *timer,
 
 #define MOD_TIMER_PENDING_ONLY		0x01
 #define MOD_TIMER_REDUCE		0x02
+#define MOD_TIMER_NOTPENDING		0x04
 
 static inline int
 __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int options)
@@ -960,7 +961,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
 	 * the timer is re-modified to have the same timeout or ends up in the
 	 * same array bucket then just return:
 	 */
-	if (timer_pending(timer)) {
+	if (!(options & MOD_TIMER_NOTPENDING) && timer_pending(timer)) {
 		/*
 		 * The downside of this optimization is that it can result in
 		 * larger granularity than you would get from adding a new
@@ -1133,7 +1134,7 @@ EXPORT_SYMBOL(timer_reduce);
 void add_timer(struct timer_list *timer)
 {
 	BUG_ON(timer_pending(timer));
-	mod_timer(timer, timer->expires);
+	__mod_timer(timer, timer->expires, MOD_TIMER_NOTPENDING);
 }
 EXPORT_SYMBOL(add_timer);
 
@@ -1891,7 +1892,7 @@ signed long __sched schedule_timeout(signed long timeout)
 
 	timer.task = current;
 	timer_setup_on_stack(&timer.timer, process_timeout, 0);
-	__mod_timer(&timer.timer, expire, 0);
+	__mod_timer(&timer.timer, expire, MOD_TIMER_NOTPENDING);
 	schedule();
 	del_singleshot_timer_sync(&timer.timer);
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 18:52   ` Paul E. McKenney
@ 2020-01-31 18:55     ` Eric Dumazet
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2020-01-31 18:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Will Deacon, Thomas Gleixner, the arch/x86 maintainers, LKML,
	Peter Zijlstra

On Fri, Jan 31, 2020 at 10:52 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Jan 31, 2020 at 08:48:05AM -0800, Eric Dumazet wrote:
> > On Fri, Jan 31, 2020 at 8:43 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > Hi folks,
> > >
> > > I just ran into c54a2744497d ("list: Add hlist_unhashed_lockless()")
> > > but I'm a bit confused about what it's trying to achieve. It also seems
> > > to have been merged without any callers (even in -next) -- was that
> > > intentional?
> > >
> > > My main source of confusion is the lack of memory barriers. For example,
> > > if you look at the following pair of functions:
> > >
> > >
> > > static inline int hlist_unhashed_lockless(const struct hlist_node *h)
> > > {
> > >         return !READ_ONCE(h->pprev);
> > > }
> > >
> > > static inline void hlist_add_before(struct hlist_node *n,
> > >                                     struct hlist_node *next)
> > > {
> > >         WRITE_ONCE(n->pprev, next->pprev);
> > >         WRITE_ONCE(n->next, next);
> > >         WRITE_ONCE(next->pprev, &n->next);
> > >         WRITE_ONCE(*(n->pprev), n);
> > > }
> > >
> > >
> > > Then running these two concurrently on the same node means that
> > > hlist_unhashed_lockless() doesn't really tell you anything about whether
> > > or not the node is reachable in the list (i.e. there is another node
> > > with a next pointer pointing to it). In other words, I think all of
> > > these outcomes are permitted:
> > >
> > >         hlist_unhashed_lockless(n)      n reachable in list
> > >         0                               0 (No reordering)
> > >         0                               1 (No reordering)
> > >         1                               0 (No reordering)
> > >         1                               1 (Reorder first and last WRITE_ONCEs)
> > >
> > > So I must be missing some details about the use-case here. Please could
> > > you enlighten me? The RCU implementation permits only the first three
> > > outcomes afaict, why not use that and leave non-RCU hlist as it was?
> >
> > I guess the following has been lost :
>
> 4d3d2ae81afd ("timer: Use hlist_unhashed_lockless() in timer_pending()")
> in -rcu, slated for not this merge window but the next one.  And
> including the changes in your later email, Eric.  Please see below
> and let me know whether you are OK with it.
>
>                                                         Thanx, Paul

Well, it seems we only have to wait for data_race() being available, right ?

Then push a patch using data_race() instead of READ_ONCE() thing.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 18:48     ` Eric Dumazet
  2020-01-31 18:52       ` Eric Dumazet
@ 2020-01-31 19:20       ` Marco Elver
  1 sibling, 0 replies; 27+ messages in thread
From: Marco Elver @ 2020-01-31 19:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Peter Zijlstra, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, LKML

On Fri, 31 Jan 2020 at 19:48, Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Jan 31, 2020 at 10:43 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, Jan 31, 2020 at 08:48:05AM -0800, Eric Dumazet wrote:
> > >     BUG: KCSAN: data-race in del_timer / detach_if_pending
> >
> > > diff --git a/include/linux/timer.h b/include/linux/timer.h
> > > index 1e6650ed066d5d28251b0bd385fc37ef94c96532..0dc19a8c39c9e49a7cde3d34bfa4be8871cbc1c2
> > > 100644
> > > --- a/include/linux/timer.h
> > > +++ b/include/linux/timer.h
> > > @@ -164,7 +164,7 @@ static inline void destroy_timer_on_stack(struct
> > > timer_list *timer) { }
> > >   */
> > >  static inline int timer_pending(const struct timer_list * timer)
> > >  {
> > > - return timer->entry.pprev != NULL;
> > > + return !hlist_unhashed_lockless(&timer->entry);
> > >  }
> >
> > That's just completely wrong.
> >
> > Aside from any memory barrier issues that might or might not be there
> > (I'm waaaay to tired atm to tell), the above code is perfectly fine.
> >
> > In fact, this is a KCSAN compiler infrastructure 'bug'.
> >
> > Any load that is only compared to zero is immune to load-tearing issues.

We're not just fighting load-tearing, but also other compiler
optimizations. Eric pointed out the loop case:

while (!timer_pending(..)) { }

Without a READ_ONCE, the compiler can turn this into:

if (!timer_pending(..)) {
  while (true) { }
}

So, unless we can enumerate all possible cases in which a compare to 0
data race can occur, and prove they're all safe for all compiler
optimizations, could we plase not claim it's a general KCSAN compiler
infrastructure bug.

This also makes me second-guess if the osq_lock case is safe (looking
at it, I think it's still safe, but only because the surrounding ops
imply compiler barriers and the compiler then won't mess up the loop).

> > The correct thing to do here is something like:
> >
> > static inline int timer_pending(const struct timer_list *timer)
> > {
> >         /*
> >          * KCSAN compiler infrastructure is insuffiently clever to
> >          * realize that a 'load compared to zero' is immune to
> >          * load-tearing.
> >          */

I could agree with this if we knew the context in which this is used.
Unfortunately we do not know the context (unlike osq_lock), and for
all we know it could be used in a loop.

> >         return data_race(timer->entry.pprev != NULL);
> > }
>
>
> This is nice, now with have data_race()

Again, if you use 'data_race()' you'll have to prove that it'll be
safe in all cases, for all compiler versions current and present. The
READ_ONCE would take that burden from you.

Thanks,
-- Marco

> Remember these patches were sent 2 months ago, at a time we were
> trying to sort out things.
>
> data_race() was merged a few days ago.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 17:33         ` Eric Dumazet
  2020-01-31 18:32           ` Will Deacon
@ 2020-01-31 19:47           ` Thomas Gleixner
  2020-01-31 20:52             ` Paul E. McKenney
  1 sibling, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2020-01-31 19:47 UTC (permalink / raw)
  To: Eric Dumazet, Will Deacon
  Cc: Paul E. McKenney, the arch/x86 maintainers, LKML, Peter Zijlstra

Eric Dumazet <edumazet@google.com> writes:
> On Fri, Jan 31, 2020 at 9:21 AM Will Deacon <will@kernel.org> wrote:
>> Without serialisation, timer_pending() as currently implemented does
>> not reliably tell you whether the timer is in the hlist. Is that not a
>> problem?
>
> No it is not a problem.

Even if we would take the base lock then this is just a snapshot, which
can be wrong at the moment the lock is dropped. So why bother?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 19:47           ` Thomas Gleixner
@ 2020-01-31 20:52             ` Paul E. McKenney
  0 siblings, 0 replies; 27+ messages in thread
From: Paul E. McKenney @ 2020-01-31 20:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Eric Dumazet, Will Deacon, the arch/x86 maintainers, LKML,
	Peter Zijlstra

On Fri, Jan 31, 2020 at 08:47:23PM +0100, Thomas Gleixner wrote:
> Eric Dumazet <edumazet@google.com> writes:
> > On Fri, Jan 31, 2020 at 9:21 AM Will Deacon <will@kernel.org> wrote:
> >> Without serialisation, timer_pending() as currently implemented does
> >> not reliably tell you whether the timer is in the hlist. Is that not a
> >> problem?
> >
> > No it is not a problem.
> 
> Even if we would take the base lock then this is just a snapshot, which
> can be wrong at the moment the lock is dropped. So why bother?

The risk of leaving it as-is or of using data_race() is that if it is
checked multiple times, the compiler might use the old value.  Yes,
we could say that things like barrier() should be used in those cases,
but READ_ONCE() has the advantage of making it so that no one has to
worry about those cases.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 18:52       ` Eric Dumazet
@ 2020-01-31 20:53         ` Paul E. McKenney
  2020-01-31 21:04           ` Eric Dumazet
  2020-02-03 15:45         ` David Laight
  1 sibling, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2020-01-31 20:53 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Peter Zijlstra, Will Deacon, Thomas Gleixner,
	the arch/x86 maintainers, LKML, Marco Elver

On Fri, Jan 31, 2020 at 10:52:46AM -0800, Eric Dumazet wrote:
> On Fri, Jan 31, 2020 at 10:48 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> 
> > This is nice, now with have data_race()
> >
> > Remember these patches were sent 2 months ago, at a time we were
> > trying to sort out things.
> >
> > data_race() was merged a few days ago.
> 
> Well, actually data_race() is not there yet anyway.
> 
> Is it really scheduled for 5.7 kernel ?

Right now, yes.  Would it make sense to separate it out and push it
into the current merge window?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 20:53         ` Paul E. McKenney
@ 2020-01-31 21:04           ` Eric Dumazet
  2020-01-31 21:19             ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2020-01-31 21:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Will Deacon, Thomas Gleixner,
	the arch/x86 maintainers, LKML, Marco Elver

On Fri, Jan 31, 2020 at 12:53 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Jan 31, 2020 at 10:52:46AM -0800, Eric Dumazet wrote:
> > On Fri, Jan 31, 2020 at 10:48 AM Eric Dumazet <edumazet@google.com> wrote:
> > >
> >
> > > This is nice, now with have data_race()
> > >
> > > Remember these patches were sent 2 months ago, at a time we were
> > > trying to sort out things.
> > >
> > > data_race() was merged a few days ago.
> >
> > Well, actually data_race() is not there yet anyway.
> >
> > Is it really scheduled for 5.7 kernel ?
>
> Right now, yes.  Would it make sense to separate it out and push it
> into the current merge window?


That would be very nice, because we could start using it before KCSAN is merged.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-01-31 21:04           ` Eric Dumazet
@ 2020-01-31 21:19             ` Paul E. McKenney
  0 siblings, 0 replies; 27+ messages in thread
From: Paul E. McKenney @ 2020-01-31 21:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Peter Zijlstra, Will Deacon, Thomas Gleixner,
	the arch/x86 maintainers, LKML, Marco Elver, mingo

On Fri, Jan 31, 2020 at 01:04:37PM -0800, Eric Dumazet wrote:
> On Fri, Jan 31, 2020 at 12:53 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Fri, Jan 31, 2020 at 10:52:46AM -0800, Eric Dumazet wrote:
> > > On Fri, Jan 31, 2020 at 10:48 AM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > >
> > > > This is nice, now with have data_race()
> > > >
> > > > Remember these patches were sent 2 months ago, at a time we were
> > > > trying to sort out things.
> > > >
> > > > data_race() was merged a few days ago.
> > >
> > > Well, actually data_race() is not there yet anyway.
> > >
> > > Is it really scheduled for 5.7 kernel ?
> >
> > Right now, yes.  Would it make sense to separate it out and push it
> > into the current merge window?
> 
> That would be very nice, because we could start using it before KCSAN is merged.

Seems reasonable to me.  This one is already inn -tip:

c48981eeb0d5 ("include/linux/compiler.h: Introduce data_race(expr) macro")

This is in tip/WIP.locking/kcsan, so I am adding Ingo on CC.

Ingo, how would you like to proceed?  For example, if it would be
helpful, I could rebase that commit on top of rcu/for-mingo and send a
pull request.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: Confused about hlist_unhashed_lockless()
  2020-01-31 18:52       ` Eric Dumazet
  2020-01-31 20:53         ` Paul E. McKenney
@ 2020-02-03 15:45         ` David Laight
  2020-02-03 15:54           ` Marco Elver
  2020-02-03 15:58           ` Paul E. McKenney
  1 sibling, 2 replies; 27+ messages in thread
From: David Laight @ 2020-02-03 15:45 UTC (permalink / raw)
  To: 'Eric Dumazet', Peter Zijlstra
  Cc: Will Deacon, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, LKML, Marco Elver

From: Eric Dumazet
> Sent: 31 January 2020 18:53
> 
> On Fri, Jan 31, 2020 at 10:48 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> 
> > This is nice, now with have data_race()
> >
> > Remember these patches were sent 2 months ago, at a time we were
> > trying to sort out things.
> >
> > data_race() was merged a few days ago.
> 
> Well, actually data_race() is not there yet anyway.

Shouldn't it be NO_DATA_RACE() ??

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-02-03 15:45         ` David Laight
@ 2020-02-03 15:54           ` Marco Elver
  2020-02-03 16:05             ` David Laight
  2020-02-03 15:58           ` Paul E. McKenney
  1 sibling, 1 reply; 27+ messages in thread
From: Marco Elver @ 2020-02-03 15:54 UTC (permalink / raw)
  To: David Laight
  Cc: Eric Dumazet, Peter Zijlstra, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, the arch/x86 maintainers, LKML

On Mon, 3 Feb 2020 at 16:45, David Laight <David.Laight@aculab.com> wrote:
>
> From: Eric Dumazet
> > Sent: 31 January 2020 18:53
> >
> > On Fri, Jan 31, 2020 at 10:48 AM Eric Dumazet <edumazet@google.com> wrote:
> > >
> >
> > > This is nice, now with have data_race()
> > >
> > > Remember these patches were sent 2 months ago, at a time we were
> > > trying to sort out things.
> > >
> > > data_race() was merged a few days ago.
> >
> > Well, actually data_race() is not there yet anyway.
>
> Shouldn't it be NO_DATA_RACE() ??

Various options were considered, and based on feedback from Linus,
decided 'data_race(..)' is the best option:
  https://lore.kernel.org/linux-fsdevel/CAHk-=wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/

It's meant to be as unobtrusive as possible, and an all-caps macro was
ruled out.

Second, the "NO_" prefix would be incorrect, since it'd be the
opposite of what it is. The macro is meant to document and mark a
deliberate data race.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-02-03 15:45         ` David Laight
  2020-02-03 15:54           ` Marco Elver
@ 2020-02-03 15:58           ` Paul E. McKenney
  2020-02-03 16:02             ` Will Deacon
  1 sibling, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2020-02-03 15:58 UTC (permalink / raw)
  To: David Laight
  Cc: 'Eric Dumazet',
	Peter Zijlstra, Will Deacon, Thomas Gleixner,
	the arch/x86 maintainers, LKML, Marco Elver

On Mon, Feb 03, 2020 at 03:45:54PM +0000, David Laight wrote:
> From: Eric Dumazet
> > Sent: 31 January 2020 18:53
> > 
> > On Fri, Jan 31, 2020 at 10:48 AM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > 
> > > This is nice, now with have data_race()
> > >
> > > Remember these patches were sent 2 months ago, at a time we were
> > > trying to sort out things.
> > >
> > > data_race() was merged a few days ago.
> > 
> > Well, actually data_race() is not there yet anyway.
> 
> Shouldn't it be NO_DATA_RACE() ??

No, because you use data_race() when there really are data races, but you
want KCSAN to ignore them.  For example, diagnostic code that doesn't
participate in the actual concurrency design and that doesn't run all
that often might use data_race().  For another example, if a developer
knew that data races existed, but that the compiler could not reasonably
do anything untoward with those data races, that developer might well
choose to use data_race() instead of READ_ONCE().  Especially if the
access in question was on a fastpath where helpful compiler optimizations
would be prohibited by use of READ_ONCE().

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-02-03 15:58           ` Paul E. McKenney
@ 2020-02-03 16:02             ` Will Deacon
  2020-02-03 17:29               ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2020-02-03 16:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: David Laight, 'Eric Dumazet',
	Peter Zijlstra, Thomas Gleixner, the arch/x86 maintainers, LKML,
	Marco Elver

On Mon, Feb 03, 2020 at 07:58:39AM -0800, Paul E. McKenney wrote:
> On Mon, Feb 03, 2020 at 03:45:54PM +0000, David Laight wrote:
> > From: Eric Dumazet
> > > Sent: 31 January 2020 18:53
> > > 
> > > On Fri, Jan 31, 2020 at 10:48 AM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > 
> > > > This is nice, now with have data_race()
> > > >
> > > > Remember these patches were sent 2 months ago, at a time we were
> > > > trying to sort out things.
> > > >
> > > > data_race() was merged a few days ago.
> > > 
> > > Well, actually data_race() is not there yet anyway.
> > 
> > Shouldn't it be NO_DATA_RACE() ??
> 
> No, because you use data_race() when there really are data races, but you
> want KCSAN to ignore them.  For example, diagnostic code that doesn't
> participate in the actual concurrency design and that doesn't run all
> that often might use data_race().  For another example, if a developer
> knew that data races existed, but that the compiler could not reasonably
> do anything untoward with those data races, that developer might well
> choose to use data_race() instead of READ_ONCE().  Especially if the
> access in question was on a fastpath where helpful compiler optimizations
> would be prohibited by use of READ_ONCE().

Yes, and in this particular case I think we can remove some WRITE_ONCE()s
from the non-RCU hlist code too (similarly for hlist_nulls).

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: Confused about hlist_unhashed_lockless()
  2020-02-03 15:54           ` Marco Elver
@ 2020-02-03 16:05             ` David Laight
  2020-02-03 17:25               ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: David Laight @ 2020-02-03 16:05 UTC (permalink / raw)
  To: 'Marco Elver'
  Cc: Eric Dumazet, Peter Zijlstra, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, the arch/x86 maintainers, LKML

From: Marco Elver
> Sent: 03 February 2020 15:55
> 
> On Mon, 3 Feb 2020 at 16:45, David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Eric Dumazet
> > > Sent: 31 January 2020 18:53
> > >
> > > On Fri, Jan 31, 2020 at 10:48 AM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > >
> > > > This is nice, now with have data_race()
> > > >
> > > > Remember these patches were sent 2 months ago, at a time we were
> > > > trying to sort out things.
> > > >
> > > > data_race() was merged a few days ago.
> > >
> > > Well, actually data_race() is not there yet anyway.
> >
> > Shouldn't it be NO_DATA_RACE() ??
> 
> Various options were considered, and based on feedback from Linus,
> decided 'data_race(..)' is the best option:
>   https://lore.kernel.org/linux-fsdevel/CAHk-
> =wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/
> 
> It's meant to be as unobtrusive as possible, and an all-caps macro was
> ruled out.

Except that it then looks like something that actually does something.

> Second, the "NO_" prefix would be incorrect, since it'd be the
> opposite of what it is. The macro is meant to document and mark a
> deliberate data race.

It should be IGNORE_DATA_RACE() then.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-02-03 16:05             ` David Laight
@ 2020-02-03 17:25               ` Paul E. McKenney
  0 siblings, 0 replies; 27+ messages in thread
From: Paul E. McKenney @ 2020-02-03 17:25 UTC (permalink / raw)
  To: David Laight
  Cc: 'Marco Elver',
	Eric Dumazet, Peter Zijlstra, Will Deacon, Thomas Gleixner,
	the arch/x86 maintainers, LKML

On Mon, Feb 03, 2020 at 04:05:25PM +0000, David Laight wrote:
> From: Marco Elver
> > Sent: 03 February 2020 15:55
> > 
> > On Mon, 3 Feb 2020 at 16:45, David Laight <David.Laight@aculab.com> wrote:
> > >
> > > From: Eric Dumazet
> > > > Sent: 31 January 2020 18:53
> > > >
> > > > On Fri, Jan 31, 2020 at 10:48 AM Eric Dumazet <edumazet@google.com> wrote:
> > > > >
> > > >
> > > > > This is nice, now with have data_race()
> > > > >
> > > > > Remember these patches were sent 2 months ago, at a time we were
> > > > > trying to sort out things.
> > > > >
> > > > > data_race() was merged a few days ago.
> > > >
> > > > Well, actually data_race() is not there yet anyway.
> > >
> > > Shouldn't it be NO_DATA_RACE() ??
> > 
> > Various options were considered, and based on feedback from Linus,
> > decided 'data_race(..)' is the best option:
> >   https://lore.kernel.org/linux-fsdevel/CAHk-
> > =wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/
> > 
> > It's meant to be as unobtrusive as possible, and an all-caps macro was
> > ruled out.
> 
> Except that it then looks like something that actually does something.
> 
> > Second, the "NO_" prefix would be incorrect, since it'd be the
> > opposite of what it is. The macro is meant to document and mark a
> > deliberate data race.
> 
> It should be IGNORE_DATA_RACE() then.

People will get used to the name more quickly than they will get used
to typing the extra seven characters.  Here is the current comment header:

/*
 * data_race(): macro to document that accesses in an expression may conflict with
 * other concurrent accesses resulting in data races, but the resulting
 * behaviour is deemed safe regardless.
 *
 * This macro *does not* affect normal code generation, but is a hint to tooling
 * that data races here should be ignored.
 */

I will be converting this to docbook form.

In addition, in the KCSAN documentation:

* KCSAN understands the ``data_race(expr)`` annotation, which tells KCSAN that
  any data races due to accesses in ``expr`` should be ignored and resulting
  behaviour when encountering a data race is deemed safe.

Fair enough?

								Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-02-03 16:02             ` Will Deacon
@ 2020-02-03 17:29               ` Paul E. McKenney
  2020-02-03 18:27                 ` Will Deacon
  0 siblings, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2020-02-03 17:29 UTC (permalink / raw)
  To: Will Deacon
  Cc: David Laight, 'Eric Dumazet',
	Peter Zijlstra, Thomas Gleixner, the arch/x86 maintainers, LKML,
	Marco Elver

On Mon, Feb 03, 2020 at 04:02:28PM +0000, Will Deacon wrote:
> On Mon, Feb 03, 2020 at 07:58:39AM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 03, 2020 at 03:45:54PM +0000, David Laight wrote:
> > > From: Eric Dumazet
> > > > Sent: 31 January 2020 18:53
> > > > 
> > > > On Fri, Jan 31, 2020 at 10:48 AM Eric Dumazet <edumazet@google.com> wrote:
> > > > >
> > > > 
> > > > > This is nice, now with have data_race()
> > > > >
> > > > > Remember these patches were sent 2 months ago, at a time we were
> > > > > trying to sort out things.
> > > > >
> > > > > data_race() was merged a few days ago.
> > > > 
> > > > Well, actually data_race() is not there yet anyway.
> > > 
> > > Shouldn't it be NO_DATA_RACE() ??
> > 
> > No, because you use data_race() when there really are data races, but you
> > want KCSAN to ignore them.  For example, diagnostic code that doesn't
> > participate in the actual concurrency design and that doesn't run all
> > that often might use data_race().  For another example, if a developer
> > knew that data races existed, but that the compiler could not reasonably
> > do anything untoward with those data races, that developer might well
> > choose to use data_race() instead of READ_ONCE().  Especially if the
> > access in question was on a fastpath where helpful compiler optimizations
> > would be prohibited by use of READ_ONCE().
> 
> Yes, and in this particular case I think we can remove some WRITE_ONCE()s
> from the non-RCU hlist code too (similarly for hlist_nulls).

Quite possibly, but we should take them case by case.  READ_ONCE()
really does protect against some optimizations, while data_race() does
not at all.

But yes, in some cases you want to -avoid- using READ_ONCE() and
WRITE_ONCE() so that KCSAN can do its job.  For example, given a per-CPU
variable that is only supposed to be accessed from the corresponding CPU
except for reads by diagnostic code, you should have the main algorithm
use plain C-language reads and writes, and have the diagnostic code
use data_race().  This allows KCSAN to correctly flag bugs that access
this per-CPU variable off-CPU while leaving the diagnostic code alone.

Seem reasonable?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-02-03 17:29               ` Paul E. McKenney
@ 2020-02-03 18:27                 ` Will Deacon
  2020-02-03 21:16                   ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2020-02-03 18:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: David Laight, 'Eric Dumazet',
	Peter Zijlstra, Thomas Gleixner, the arch/x86 maintainers, LKML,
	Marco Elver

Hi Paul,

On Mon, Feb 03, 2020 at 09:29:47AM -0800, Paul E. McKenney wrote:
> On Mon, Feb 03, 2020 at 04:02:28PM +0000, Will Deacon wrote:
> > On Mon, Feb 03, 2020 at 07:58:39AM -0800, Paul E. McKenney wrote:
> > > On Mon, Feb 03, 2020 at 03:45:54PM +0000, David Laight wrote:
> > > > From: Eric Dumazet
> > > > > Sent: 31 January 2020 18:53
> > > > > 
> > > > > On Fri, Jan 31, 2020 at 10:48 AM Eric Dumazet <edumazet@google.com> wrote:
> > > > > >
> > > > > 
> > > > > > This is nice, now with have data_race()
> > > > > >
> > > > > > Remember these patches were sent 2 months ago, at a time we were
> > > > > > trying to sort out things.
> > > > > >
> > > > > > data_race() was merged a few days ago.
> > > > > 
> > > > > Well, actually data_race() is not there yet anyway.
> > > > 
> > > > Shouldn't it be NO_DATA_RACE() ??
> > > 
> > > No, because you use data_race() when there really are data races, but you
> > > want KCSAN to ignore them.  For example, diagnostic code that doesn't
> > > participate in the actual concurrency design and that doesn't run all
> > > that often might use data_race().  For another example, if a developer
> > > knew that data races existed, but that the compiler could not reasonably
> > > do anything untoward with those data races, that developer might well
> > > choose to use data_race() instead of READ_ONCE().  Especially if the
> > > access in question was on a fastpath where helpful compiler optimizations
> > > would be prohibited by use of READ_ONCE().
> > 
> > Yes, and in this particular case I think we can remove some WRITE_ONCE()s
> > from the non-RCU hlist code too (similarly for hlist_nulls).
> 
> Quite possibly, but we should take them case by case.  READ_ONCE()
> really does protect against some optimizations, while data_race() does
> not at all.

Agreed, and I plan to send patches for review so we can discuss them in
more detail then.

> But yes, in some cases you want to -avoid- using READ_ONCE() and
> WRITE_ONCE() so that KCSAN can do its job.  For example, given a per-CPU
> variable that is only supposed to be accessed from the corresponding CPU
> except for reads by diagnostic code, you should have the main algorithm
> use plain C-language reads and writes, and have the diagnostic code
> use data_race().  This allows KCSAN to correctly flag bugs that access
> this per-CPU variable off-CPU while leaving the diagnostic code alone.

Yes, and in a similar vein I think the WRITE_ONCE() additions to the hlist
code may hide unintentional racy access to the hlist where I would argue
that the correct behaviour is either to acknowledge the data race (like the
timer code) or to use the RCU variant. The problem with what's currently in
mainline is that it reads a bit like the non-RCU hlist is directly usable as
a lock-free list implementation, which really isn't the case.

> Seem reasonable?

It does to me, but we should probably try to apply this a bit more
consistently in patch review. Adding {READ,WRITE}_ONCE() until the
sanitiser shuts up is easy, but picking that apart later on is a real
challenge.

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Confused about hlist_unhashed_lockless()
  2020-02-03 18:27                 ` Will Deacon
@ 2020-02-03 21:16                   ` Paul E. McKenney
  0 siblings, 0 replies; 27+ messages in thread
From: Paul E. McKenney @ 2020-02-03 21:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: David Laight, 'Eric Dumazet',
	Peter Zijlstra, Thomas Gleixner, the arch/x86 maintainers, LKML,
	Marco Elver

On Mon, Feb 03, 2020 at 06:27:37PM +0000, Will Deacon wrote:
> Hi Paul,
> 
> On Mon, Feb 03, 2020 at 09:29:47AM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 03, 2020 at 04:02:28PM +0000, Will Deacon wrote:
> > > On Mon, Feb 03, 2020 at 07:58:39AM -0800, Paul E. McKenney wrote:
> > > > On Mon, Feb 03, 2020 at 03:45:54PM +0000, David Laight wrote:
> > > > > From: Eric Dumazet
> > > > > > Sent: 31 January 2020 18:53
> > > > > > 
> > > > > > On Fri, Jan 31, 2020 at 10:48 AM Eric Dumazet <edumazet@google.com> wrote:
> > > > > > >
> > > > > > 
> > > > > > > This is nice, now with have data_race()
> > > > > > >
> > > > > > > Remember these patches were sent 2 months ago, at a time we were
> > > > > > > trying to sort out things.
> > > > > > >
> > > > > > > data_race() was merged a few days ago.
> > > > > > 
> > > > > > Well, actually data_race() is not there yet anyway.
> > > > > 
> > > > > Shouldn't it be NO_DATA_RACE() ??
> > > > 
> > > > No, because you use data_race() when there really are data races, but you
> > > > want KCSAN to ignore them.  For example, diagnostic code that doesn't
> > > > participate in the actual concurrency design and that doesn't run all
> > > > that often might use data_race().  For another example, if a developer
> > > > knew that data races existed, but that the compiler could not reasonably
> > > > do anything untoward with those data races, that developer might well
> > > > choose to use data_race() instead of READ_ONCE().  Especially if the
> > > > access in question was on a fastpath where helpful compiler optimizations
> > > > would be prohibited by use of READ_ONCE().
> > > 
> > > Yes, and in this particular case I think we can remove some WRITE_ONCE()s
> > > from the non-RCU hlist code too (similarly for hlist_nulls).
> > 
> > Quite possibly, but we should take them case by case.  READ_ONCE()
> > really does protect against some optimizations, while data_race() does
> > not at all.
> 
> Agreed, and I plan to send patches for review so we can discuss them in
> more detail then.

Looking forward to seeing them!

> > But yes, in some cases you want to -avoid- using READ_ONCE() and
> > WRITE_ONCE() so that KCSAN can do its job.  For example, given a per-CPU
> > variable that is only supposed to be accessed from the corresponding CPU
> > except for reads by diagnostic code, you should have the main algorithm
> > use plain C-language reads and writes, and have the diagnostic code
> > use data_race().  This allows KCSAN to correctly flag bugs that access
> > this per-CPU variable off-CPU while leaving the diagnostic code alone.
> 
> Yes, and in a similar vein I think the WRITE_ONCE() additions to the hlist
> code may hide unintentional racy access to the hlist where I would argue
> that the correct behaviour is either to acknowledge the data race (like the
> timer code) or to use the RCU variant. The problem with what's currently in
> mainline is that it reads a bit like the non-RCU hlist is directly usable as
> a lock-free list implementation, which really isn't the case.

Fair point.  So your thought is that the RCU variant (or something
similar to it) is used in the lockless case, and that KCSAN complains
about lockless use of the non-READ_ONCE() interface?  If so, that seems
like it might work quite well.

> > Seem reasonable?
> 
> It does to me, but we should probably try to apply this a bit more
> consistently in patch review. Adding {READ,WRITE}_ONCE() until the
> sanitiser shuts up is easy, but picking that apart later on is a real
> challenge.

Agreed.  It does seem like early days for laying out a reasonable set
of use cases, but we do have to start somewhere.  My hope is that the
resulting automated review of concurency will be more than worth the
effort.  Hope springs eternal and all that...  :-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-02-03 21:16 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-31 16:43 Confused about hlist_unhashed_lockless() Will Deacon
2020-01-31 16:48 ` Eric Dumazet
2020-01-31 16:57   ` Will Deacon
2020-01-31 17:06     ` Eric Dumazet
2020-01-31 17:20       ` Will Deacon
2020-01-31 17:33         ` Eric Dumazet
2020-01-31 18:32           ` Will Deacon
2020-01-31 19:47           ` Thomas Gleixner
2020-01-31 20:52             ` Paul E. McKenney
2020-01-31 18:43   ` Peter Zijlstra
2020-01-31 18:48     ` Eric Dumazet
2020-01-31 18:52       ` Eric Dumazet
2020-01-31 20:53         ` Paul E. McKenney
2020-01-31 21:04           ` Eric Dumazet
2020-01-31 21:19             ` Paul E. McKenney
2020-02-03 15:45         ` David Laight
2020-02-03 15:54           ` Marco Elver
2020-02-03 16:05             ` David Laight
2020-02-03 17:25               ` Paul E. McKenney
2020-02-03 15:58           ` Paul E. McKenney
2020-02-03 16:02             ` Will Deacon
2020-02-03 17:29               ` Paul E. McKenney
2020-02-03 18:27                 ` Will Deacon
2020-02-03 21:16                   ` Paul E. McKenney
2020-01-31 19:20       ` Marco Elver
2020-01-31 18:52   ` Paul E. McKenney
2020-01-31 18:55     ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.