All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Eric Dumazet <edumazet@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: [PATCH 1/2] list: add hlist_unhashed_lockless()
Date: Sat, 9 Nov 2019 09:54:40 -0800	[thread overview]
Message-ID: <20191109175440.GJ20975@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <CANn89iJsh5X4k2SsT0iNdRJPs4k2Hun3EJak1iomcKahmEJJwg@mail.gmail.com>

On Fri, Nov 08, 2019 at 07:15:16PM -0800, Eric Dumazet wrote:
> On Fri, Nov 8, 2019 at 3:42 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Fri, Nov 08, 2019 at 12:17:49PM -0800, Eric Dumazet wrote:
> > > On Fri, Nov 8, 2019 at 11:24 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Thu, Nov 07, 2019 at 11:37:37AM -0800, Eric Dumazet wrote:
> > > > > We would like to use hlist_unhashed() from timer_pending(),
> > > > > which runs without protection of a lock.
> > > > >
> > > > > Note that other callers might also want to use this variant.
> > > > >
> > > > > Instead of forcing a READ_ONCE() for all hlist_unhashed()
> > > > > callers, add a new helper with an explicit _lockless suffix
> > > > > in the name to better document what is going on.
> > > > >
> > > > > Also add various WRITE_ONCE() in __hlist_del(), hlist_add_head()
> > > > > and hlist_add_before()/hlist_add_behind() to pair with
> > > > > the READ_ONCE().
> > > > >
> > > > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > >
> > > > I have queued this, but if you prefer it go some other way:
> > > >
> > > > Acked-by: Paul E. McKenney <paulmck@kernel.org>
> > > >
> > > > But shouldn't the uses in include/linux/rculist.h also be converted
> > > > into the patch below?  If so, I will squash the following into your
> > > > patch.
> > > >
> > > >                                                 Thanx, Paul
> > > >
> > > > ------------------------------------------------------------------------
> > >
> > > Agreed, thanks for the addition of this Paul.
> >
> > Very good, squashed and pushed, thank you!
> >
> 
> I have another KCSAN report of a bug that will force us to use
> hlist_unhashed_lockless() from sk_unhashed()
> 
> (Meaning we also need to add some WRITE_ONCE() annotations to
> include/linux/list_nulls.h )
> 
> BUG: KCSAN: data-race in inet_unhash / inet_unhash
> 
> write to 0xffff8880a69a0170 of 8 bytes by interrupt on cpu 1:
>  __hlist_nulls_del include/linux/list_nulls.h:88 [inline]
>  hlist_nulls_del_init_rcu include/linux/rculist_nulls.h:36 [inline]
>  __sk_nulls_del_node_init_rcu include/net/sock.h:676 [inline]
>  inet_unhash+0x38f/0x4a0 net/ipv4/inet_hashtables.c:612
>  tcp_set_state+0xfa/0x3e0 net/ipv4/tcp.c:2249
>  tcp_done+0x93/0x1e0 net/ipv4/tcp.c:3854
>  tcp_write_err+0x7e/0xc0 net/ipv4/tcp_timer.c:56
>  tcp_retransmit_timer+0x9b8/0x16d0 net/ipv4/tcp_timer.c:479
>  tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:599
>  tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:619
>  call_timer_fn+0x5f/0x2f0 kernel/time/timer.c:1404
>  expire_timers kernel/time/timer.c:1449 [inline]
>  __run_timers kernel/time/timer.c:1773 [inline]
>  __run_timers kernel/time/timer.c:1740 [inline]
>  run_timer_softirq+0xc0c/0xcd0 kernel/time/timer.c:1786
>  __do_softirq+0x115/0x33f kernel/softirq.c:292
>  invoke_softirq kernel/softirq.c:373 [inline]
>  irq_exit+0xbb/0xe0 kernel/softirq.c:413
>  exiting_irq arch/x86/include/asm/apic.h:536 [inline]
>  smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
>  native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71
>  arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571
>  default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
>  cpuidle_idle_call kernel/sched/idle.c:154 [inline]
>  do_idle+0x1af/0x280 kernel/sched/idle.c:263
>  cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
>  start_secondary+0x208/0x260 arch/x86/kernel/smpboot.c:264
>  secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
> 
> read to 0xffff8880a69a0170 of 8 bytes by interrupt on cpu 0:
>  sk_unhashed include/net/sock.h:607 [inline]
>  inet_unhash+0x3d/0x4a0 net/ipv4/inet_hashtables.c:592
>  tcp_set_state+0xfa/0x3e0 net/ipv4/tcp.c:2249
>  tcp_done+0x93/0x1e0 net/ipv4/tcp.c:3854
>  tcp_write_err+0x7e/0xc0 net/ipv4/tcp_timer.c:56
>  tcp_retransmit_timer+0x9b8/0x16d0 net/ipv4/tcp_timer.c:479
>  tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:599
>  tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:619
>  call_timer_fn+0x5f/0x2f0 kernel/time/timer.c:1404
>  expire_timers kernel/time/timer.c:1449 [inline]
>  __run_timers kernel/time/timer.c:1773 [inline]
>  __run_timers kernel/time/timer.c:1740 [inline]
>  run_timer_softirq+0xc0c/0xcd0 kernel/time/timer.c:1786
>  __do_softirq+0x115/0x33f kernel/softirq.c:292
>  invoke_softirq kernel/softirq.c:373 [inline]
>  irq_exit+0xbb/0xe0 kernel/softirq.c:413
>  exiting_irq arch/x86/include/asm/apic.h:536 [inline]
>  smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
>  native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71
>  arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571
>  default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
>  cpuidle_idle_call kernel/sched/idle.c:154 [inline]
>  do_idle+0x1af/0x280 kernel/sched/idle.c:263
>  cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
>  rest_init+0xec/0xf6 init/main.c:452
>  arch_call_rest_init+0x17/0x37
>  start_kernel+0x838/0x85e init/main.c:786
>  x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490
>  x86_64_start_kernel+0x72/0x76 arch/x86/kernel/head64.c:471
>  secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
> 
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc6+ #0
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011

Like this?

						Thanx, Paul

------------------------------------------------------------------------

commit fef2da9c0cfa4f9ec405ff059fceb00d29de34dc
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Sat Nov 9 09:42:13 2019 -0800

    rcu: Use WRITE_ONCE() for assignments to ->pprev for hlist_nulls
    
    Eric Dumazet supplied a KCSAN report of a bug that forces use
    of hlist_unhashed_lockless() from sk_unhashed():
    
    ------------------------------------------------------------------------
    
    BUG: KCSAN: data-race in inet_unhash / inet_unhash
    
    write to 0xffff8880a69a0170 of 8 bytes by interrupt on cpu 1:
     __hlist_nulls_del include/linux/list_nulls.h:88 [inline]
     hlist_nulls_del_init_rcu include/linux/rculist_nulls.h:36 [inline]
     __sk_nulls_del_node_init_rcu include/net/sock.h:676 [inline]
     inet_unhash+0x38f/0x4a0 net/ipv4/inet_hashtables.c:612
     tcp_set_state+0xfa/0x3e0 net/ipv4/tcp.c:2249
     tcp_done+0x93/0x1e0 net/ipv4/tcp.c:3854
     tcp_write_err+0x7e/0xc0 net/ipv4/tcp_timer.c:56
     tcp_retransmit_timer+0x9b8/0x16d0 net/ipv4/tcp_timer.c:479
     tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:599
     tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:619
     call_timer_fn+0x5f/0x2f0 kernel/time/timer.c:1404
     expire_timers kernel/time/timer.c:1449 [inline]
     __run_timers kernel/time/timer.c:1773 [inline]
     __run_timers kernel/time/timer.c:1740 [inline]
     run_timer_softirq+0xc0c/0xcd0 kernel/time/timer.c:1786
     __do_softirq+0x115/0x33f kernel/softirq.c:292
     invoke_softirq kernel/softirq.c:373 [inline]
     irq_exit+0xbb/0xe0 kernel/softirq.c:413
     exiting_irq arch/x86/include/asm/apic.h:536 [inline]
     smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
     apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
     native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71
     arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571
     default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
     cpuidle_idle_call kernel/sched/idle.c:154 [inline]
     do_idle+0x1af/0x280 kernel/sched/idle.c:263
     cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
     start_secondary+0x208/0x260 arch/x86/kernel/smpboot.c:264
     secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
    
    read to 0xffff8880a69a0170 of 8 bytes by interrupt on cpu 0:
     sk_unhashed include/net/sock.h:607 [inline]
     inet_unhash+0x3d/0x4a0 net/ipv4/inet_hashtables.c:592
     tcp_set_state+0xfa/0x3e0 net/ipv4/tcp.c:2249
     tcp_done+0x93/0x1e0 net/ipv4/tcp.c:3854
     tcp_write_err+0x7e/0xc0 net/ipv4/tcp_timer.c:56
     tcp_retransmit_timer+0x9b8/0x16d0 net/ipv4/tcp_timer.c:479
     tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:599
     tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:619
     call_timer_fn+0x5f/0x2f0 kernel/time/timer.c:1404
     expire_timers kernel/time/timer.c:1449 [inline]
     __run_timers kernel/time/timer.c:1773 [inline]
     __run_timers kernel/time/timer.c:1740 [inline]
     run_timer_softirq+0xc0c/0xcd0 kernel/time/timer.c:1786
     __do_softirq+0x115/0x33f kernel/softirq.c:292
     invoke_softirq kernel/softirq.c:373 [inline]
     irq_exit+0xbb/0xe0 kernel/softirq.c:413
     exiting_irq arch/x86/include/asm/apic.h:536 [inline]
     smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
     apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
     native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71
     arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571
     default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
     cpuidle_idle_call kernel/sched/idle.c:154 [inline]
     do_idle+0x1af/0x280 kernel/sched/idle.c:263
     cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
     rest_init+0xec/0xf6 init/main.c:452
     arch_call_rest_init+0x17/0x37
     start_kernel+0x838/0x85e init/main.c:786
     x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490
     x86_64_start_kernel+0x72/0x76 arch/x86/kernel/head64.c:471
     secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
    
    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc6+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine,
    BIOS Google 01/01/2011
    
    ------------------------------------------------------------------------
    
    This commit therefore replaces C-language assignments with WRITE_ONCE()
    in include/linux/list_nulls.h and include/linux/rculist_nulls.h.
    
    Reported-by: Eric Dumazet <edumazet@google.com> # For KCSAN
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/include/linux/list_nulls.h b/include/linux/list_nulls.h
index 3ef9674..1ecd356 100644
--- a/include/linux/list_nulls.h
+++ b/include/linux/list_nulls.h
@@ -72,10 +72,10 @@ static inline void hlist_nulls_add_head(struct hlist_nulls_node *n,
 	struct hlist_nulls_node *first = h->first;
 
 	n->next = first;
-	n->pprev = &h->first;
+	WRITE_ONCE(n->pprev, &h->first);
 	h->first = n;
 	if (!is_a_nulls(first))
-		first->pprev = &n->next;
+		WRITE_ONCE(first->pprev, &n->next);
 }
 
 static inline void __hlist_nulls_del(struct hlist_nulls_node *n)
@@ -85,13 +85,13 @@ static inline void __hlist_nulls_del(struct hlist_nulls_node *n)
 
 	WRITE_ONCE(*pprev, next);
 	if (!is_a_nulls(next))
-		next->pprev = pprev;
+		WRITE_ONCE(next->pprev, pprev);
 }
 
 static inline void hlist_nulls_del(struct hlist_nulls_node *n)
 {
 	__hlist_nulls_del(n);
-	n->pprev = LIST_POISON2;
+	WRITE_ONCE(n->pprev, LIST_POISON2);
 }
 
 /**
diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index bc8206a..517a06f 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -34,7 +34,7 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
 {
 	if (!hlist_nulls_unhashed(n)) {
 		__hlist_nulls_del(n);
-		n->pprev = NULL;
+		WRITE_ONCE(n->pprev, NULL);
 	}
 }
 
@@ -66,7 +66,7 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
 static inline void hlist_nulls_del_rcu(struct hlist_nulls_node *n)
 {
 	__hlist_nulls_del(n);
-	n->pprev = LIST_POISON2;
+	WRITE_ONCE(n->pprev, LIST_POISON2);
 }
 
 /**
@@ -94,10 +94,10 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
 	struct hlist_nulls_node *first = h->first;
 
 	n->next = first;
-	n->pprev = &h->first;
+	WRITE_ONCE(n->pprev, &h->first);
 	rcu_assign_pointer(hlist_nulls_first_rcu(h), n);
 	if (!is_a_nulls(first))
-		first->pprev = &n->next;
+		WRITE_ONCE(first->pprev, &n->next);
 }
 
 /**

  reply	other threads:[~2019-11-10  0:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-07 19:37 [PATCH 1/2] list: add hlist_unhashed_lockless() Eric Dumazet
2019-11-07 19:37 ` [PATCH 2/2] timer: use hlist_unhashed_lockless() in timer_pending() Eric Dumazet
2019-11-08 19:27   ` Paul E. McKenney
2019-11-08 19:24 ` [PATCH 1/2] list: add hlist_unhashed_lockless() Paul E. McKenney
2019-11-08 20:17   ` Eric Dumazet
2019-11-08 23:42     ` Paul E. McKenney
2019-11-09  3:15       ` Eric Dumazet
2019-11-09 17:54         ` Paul E. McKenney [this message]
2019-11-09 18:53           ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191109175440.GJ20975@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.