linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] lib: fix data race in rhashtable_rehash_one
@ 2015-09-21  8:08 Dmitry Vyukov
  2015-09-21 13:31 ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Vyukov @ 2015-09-21  8:08 UTC (permalink / raw)
  To: tgraf, netdev, linux-kernel
  Cc: kcc, andreyknvl, glider, ktsan, paulmck, Dmitry Vyukov

rhashtable_rehash_one() uses plain writes to update entry->next,
while it is being concurrently accessed by readers.
Unfortunately, the compiler is within its rights to (for example) use
byte-at-a-time writes to update the pointer, which would fatally confuse
concurrent readers.

Use WRITE_ONCE to update entry->next in rhashtable_rehash_one().

The data race was found with KernelThreadSanitizer (KTSAN).

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
---
KTSAN report for the record:

ThreadSanitizer: data-race in netlink_lookup

Atomic read at 0xffff880480443bd0 of size 8 by thread 2747 on CPU 11:
 [<     inline     >] rhashtable_lookup_fast include/linux/rhashtable.h:543
 [<     inline     >] __netlink_lookup net/netlink/af_netlink.c:1026
 [<ffffffff81bd9a84>] netlink_lookup+0x134/0x1c0 net/netlink/af_netlink.c:1046
 [<     inline     >] netlink_getsockbyportid net/netlink/af_netlink.c:1616
 [<ffffffff81bdc701>] netlink_unicast+0x111/0x300 net/netlink/af_netlink.c:1812
 [<ffffffff81bdcdb9>] netlink_sendmsg+0x4c9/0x5f0 net/netlink/af_netlink.c:2443
 [<     inline     >] sock_sendmsg_nosec net/socket.c:610
 [<ffffffff81b5d6f3>] sock_sendmsg+0x83/0x90 net/socket.c:620
 [<ffffffff81b5e59f>] ___sys_sendmsg+0x3cf/0x3e0 net/socket.c:1952
 [<ffffffff81b5f6ac>] __sys_sendmsg+0x4c/0xb0 net/socket.c:1986
 [<     inline     >] SYSC_sendmsg net/socket.c:1997
 [<ffffffff81b5f740>] SyS_sendmsg+0x30/0x50 net/socket.c:1993
 [<ffffffff81ee3e11>] entry_SYSCALL_64_fastpath+0x31/0x95
arch/x86/entry/entry_64.S:188

Previous write at 0xffff880480443bd0 of size 8 by thread 213 on CPU 4:
 [<     inline     >] rhashtable_rehash_one lib/rhashtable.c:193
 [<     inline     >] rhashtable_rehash_chain lib/rhashtable.c:213
 [<     inline     >] rhashtable_rehash_table lib/rhashtable.c:257
 [<ffffffff8156f7e0>] rht_deferred_worker+0x3b0/0x6d0 lib/rhashtable.c:373
 [<ffffffff810b1d6e>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
 [<ffffffff810b22d0>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
 [<ffffffff810bba40>] kthread+0x150/0x170 kernel/kthread.c:209
 [<ffffffff81ee420f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529

Mutexes locked by thread 213:
Mutex 217217 is locked here:
 [<ffffffff81ee0407>] mutex_lock+0x57/0x70 kernel/locking/mutex.c:108
 [<ffffffff8156f475>] rht_deferred_worker+0x45/0x6d0 lib/rhashtable.c:363
 [<ffffffff810b1d6e>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
 [<ffffffff810b22d0>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
 [<ffffffff810bba40>] kthread+0x150/0x170 kernel/kthread.c:209
 [<ffffffff81ee420f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529

Mutex 431216 is locked here:
 [<     inline     >] __raw_spin_lock_bh include/linux/spinlock_api_smp.h:149
 [<ffffffff81ee3195>] _raw_spin_lock_bh+0x65/0x80 kernel/locking/spinlock.c:175
 [<     inline     >] spin_lock_bh include/linux/spinlock.h:317
 [<     inline     >] rhashtable_rehash_chain lib/rhashtable.c:212
 [<     inline     >] rhashtable_rehash_table lib/rhashtable.c:257
 [<ffffffff8156f616>] rht_deferred_worker+0x1e6/0x6d0 lib/rhashtable.c:373
 [<ffffffff810b1d6e>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
 [<ffffffff810b22d0>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
 [<ffffffff810bba40>] kthread+0x150/0x170 kernel/kthread.c:209
 [<ffffffff81ee420f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529

Mutex 432766 is locked here:
 [<     inline     >] __raw_spin_lock include/linux/spinlock_api_smp.h:158
 [<ffffffff81ee37d0>] _raw_spin_lock+0x50/0x70 kernel/locking/spinlock.c:151
 [<     inline     >] rhashtable_rehash_one lib/rhashtable.c:186
 [<     inline     >] rhashtable_rehash_chain lib/rhashtable.c:213
 [<     inline     >] rhashtable_rehash_table lib/rhashtable.c:257
 [<ffffffff8156f79b>] rht_deferred_worker+0x36b/0x6d0 lib/rhashtable.c:373
 [<ffffffff810b1d6e>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
 [<ffffffff810b22d0>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
 [<ffffffff810bba40>] kthread+0x150/0x170 kernel/kthread.c:209
 [<ffffffff81ee420f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529
---
 lib/rhashtable.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index cc0c697..978624d 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -188,9 +188,12 @@ static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash)
 				      new_tbl, new_hash);
 
 	if (rht_is_a_nulls(head))
-		INIT_RHT_NULLS_HEAD(entry->next, ht, new_hash);
-	else
-		RCU_INIT_POINTER(entry->next, head);
+		head = (struct rhash_head *)rht_marker(ht, new_hash);
+	/* We don't insert any new nodes that were not previously accessible
+	 * to readers, so we don't need to use rcu_assign_pointer().
+	 * But entry is being concurrently accessed by readers, so we need to
+	 * use at least WRITE_ONCE. */
+	WRITE_ONCE(entry->next, head);
 
 	rcu_assign_pointer(new_tbl->buckets[new_hash], entry);
 	spin_unlock(new_bucket_lock);
-- 
2.6.0.rc0.131.gf624c3d


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] lib: fix data race in rhashtable_rehash_one
  2015-09-21  8:08 [PATCH] lib: fix data race in rhashtable_rehash_one Dmitry Vyukov
@ 2015-09-21 13:31 ` Eric Dumazet
  2015-09-21 14:51   ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2015-09-21 13:31 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: tgraf, netdev, linux-kernel, kcc, andreyknvl, glider, ktsan, paulmck

On Mon, 2015-09-21 at 10:08 +0200, Dmitry Vyukov wrote:
> rhashtable_rehash_one() uses plain writes to update entry->next,
> while it is being concurrently accessed by readers.
> Unfortunately, the compiler is within its rights to (for example) use
> byte-at-a-time writes to update the pointer, which would fatally confuse
> concurrent readers.
> 
> Use WRITE_ONCE to update entry->next in rhashtable_rehash_one().
> 
> The data race was found with KernelThreadSanitizer (KTSAN).
> 
> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> ---
> KTSAN report for the record:
> 
> ThreadSanitizer: data-race in netlink_lookup
> 
> Atomic read at 0xffff880480443bd0 of size 8 by thread 2747 on CPU 11:
>  [<     inline     >] rhashtable_lookup_fast include/linux/rhashtable.h:543
>  [<     inline     >] __netlink_lookup net/netlink/af_netlink.c:1026
>  [<ffffffff81bd9a84>] netlink_lookup+0x134/0x1c0 net/netlink/af_netlink.c:1046
>  [<     inline     >] netlink_getsockbyportid net/netlink/af_netlink.c:1616
>  [<ffffffff81bdc701>] netlink_unicast+0x111/0x300 net/netlink/af_netlink.c:1812
>  [<ffffffff81bdcdb9>] netlink_sendmsg+0x4c9/0x5f0 net/netlink/af_netlink.c:2443
>  [<     inline     >] sock_sendmsg_nosec net/socket.c:610
>  [<ffffffff81b5d6f3>] sock_sendmsg+0x83/0x90 net/socket.c:620
>  [<ffffffff81b5e59f>] ___sys_sendmsg+0x3cf/0x3e0 net/socket.c:1952
>  [<ffffffff81b5f6ac>] __sys_sendmsg+0x4c/0xb0 net/socket.c:1986
>  [<     inline     >] SYSC_sendmsg net/socket.c:1997
>  [<ffffffff81b5f740>] SyS_sendmsg+0x30/0x50 net/socket.c:1993
>  [<ffffffff81ee3e11>] entry_SYSCALL_64_fastpath+0x31/0x95
> arch/x86/entry/entry_64.S:188
> 
> Previous write at 0xffff880480443bd0 of size 8 by thread 213 on CPU 4:
>  [<     inline     >] rhashtable_rehash_one lib/rhashtable.c:193
>  [<     inline     >] rhashtable_rehash_chain lib/rhashtable.c:213
>  [<     inline     >] rhashtable_rehash_table lib/rhashtable.c:257
>  [<ffffffff8156f7e0>] rht_deferred_worker+0x3b0/0x6d0 lib/rhashtable.c:373
>  [<ffffffff810b1d6e>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>  [<ffffffff810b22d0>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>  [<ffffffff810bba40>] kthread+0x150/0x170 kernel/kthread.c:209
>  [<ffffffff81ee420f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529
> 
> Mutexes locked by thread 213:
> Mutex 217217 is locked here:
>  [<ffffffff81ee0407>] mutex_lock+0x57/0x70 kernel/locking/mutex.c:108
>  [<ffffffff8156f475>] rht_deferred_worker+0x45/0x6d0 lib/rhashtable.c:363
>  [<ffffffff810b1d6e>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>  [<ffffffff810b22d0>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>  [<ffffffff810bba40>] kthread+0x150/0x170 kernel/kthread.c:209
>  [<ffffffff81ee420f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529
> 
> Mutex 431216 is locked here:
>  [<     inline     >] __raw_spin_lock_bh include/linux/spinlock_api_smp.h:149
>  [<ffffffff81ee3195>] _raw_spin_lock_bh+0x65/0x80 kernel/locking/spinlock.c:175
>  [<     inline     >] spin_lock_bh include/linux/spinlock.h:317
>  [<     inline     >] rhashtable_rehash_chain lib/rhashtable.c:212
>  [<     inline     >] rhashtable_rehash_table lib/rhashtable.c:257
>  [<ffffffff8156f616>] rht_deferred_worker+0x1e6/0x6d0 lib/rhashtable.c:373
>  [<ffffffff810b1d6e>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>  [<ffffffff810b22d0>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>  [<ffffffff810bba40>] kthread+0x150/0x170 kernel/kthread.c:209
>  [<ffffffff81ee420f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529
> 
> Mutex 432766 is locked here:
>  [<     inline     >] __raw_spin_lock include/linux/spinlock_api_smp.h:158
>  [<ffffffff81ee37d0>] _raw_spin_lock+0x50/0x70 kernel/locking/spinlock.c:151
>  [<     inline     >] rhashtable_rehash_one lib/rhashtable.c:186
>  [<     inline     >] rhashtable_rehash_chain lib/rhashtable.c:213
>  [<     inline     >] rhashtable_rehash_table lib/rhashtable.c:257
>  [<ffffffff8156f79b>] rht_deferred_worker+0x36b/0x6d0 lib/rhashtable.c:373
>  [<ffffffff810b1d6e>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>  [<ffffffff810b22d0>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>  [<ffffffff810bba40>] kthread+0x150/0x170 kernel/kthread.c:209
>  [<ffffffff81ee420f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:529
> ---
>  lib/rhashtable.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index cc0c697..978624d 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -188,9 +188,12 @@ static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash)
>  				      new_tbl, new_hash);
>  
>  	if (rht_is_a_nulls(head))
> -		INIT_RHT_NULLS_HEAD(entry->next, ht, new_hash);
> -	else
> -		RCU_INIT_POINTER(entry->next, head);
> +		head = (struct rhash_head *)rht_marker(ht, new_hash);
> +	/* We don't insert any new nodes that were not previously accessible
> +	 * to readers, so we don't need to use rcu_assign_pointer().
> +	 * But entry is being concurrently accessed by readers, so we need to
> +	 * use at least WRITE_ONCE. */

This is bogus.

1) Linux is certainly not working if some arch or compiler is not doing
single word writes. WRITE_ONCE() would not help at all to enforce this.

2) If  new node is not yet visible, we don't care if we write
entry->next using any kind of operation.

So the WRITE_ONCE() is not needed at all.



> +	WRITE_ONCE(entry->next, head);


The rcu_assign_pointer() immediately following is enough in this case.

We have hundred of similar cases in the kernel.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] lib: fix data race in rhashtable_rehash_one
  2015-09-21 13:31 ` Eric Dumazet
@ 2015-09-21 14:51   ` Eric Dumazet
  2015-09-21 15:10     ` Dmitry Vyukov
  2015-09-21 22:25     ` Thomas Graf
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2015-09-21 14:51 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: tgraf, netdev, linux-kernel, kcc, andreyknvl, glider, ktsan, paulmck

On Mon, 2015-09-21 at 06:31 -0700, Eric Dumazet wrote:
> On Mon, 2015-09-21 at 10:08 +0200, Dmitry Vyukov wrote:
> > rhashtable_rehash_one() uses plain writes to update entry->next,
> > while it is being concurrently accessed by readers.
> > Unfortunately, the compiler is within its rights to (for example) use
> > byte-at-a-time writes to update the pointer, which would fatally confuse
> > concurrent readers.
> > 
> This is bogus.
> 
> 1) Linux is certainly not working if some arch or compiler is not doing
> single word writes. WRITE_ONCE() would not help at all to enforce this.
> 
> 2) If  new node is not yet visible, we don't care if we write
> entry->next using any kind of operation.
> 
> So the WRITE_ONCE() is not needed at all.
> 
> 
> 
> > +	WRITE_ONCE(entry->next, head);
> 
> 
> The rcu_assign_pointer() immediately following is enough in this case.
> 
> We have hundred of similar cases in the kernel.
> 
> 

The changelog and comment are totally confusing.

Please remove the bogus parts in them, and/or rephrase.

The important part here is that we rehash an item, so we need to make
sure to maintain consistent ->next field, and need to prevent compiler
from using ->next as a temporary variable.

ptr->next = 1UL | ((base + offset) << 1);

Is dangerous because compiler could issue :

ptr->next = (base + offset);

ptr->next <<= 1;

ptr->next += 1UL;

Frankly, all this looks like an oversight in this code.

Not sure why the NULLS value is even recomputed.


diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index cc0c69710dcf..0a29f07ba45a 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -187,10 +187,7 @@ static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash)
 	head = rht_dereference_bucket(new_tbl->buckets[new_hash],
 				      new_tbl, new_hash);
 
-	if (rht_is_a_nulls(head))
-		INIT_RHT_NULLS_HEAD(entry->next, ht, new_hash);
-	else
-		RCU_INIT_POINTER(entry->next, head);
+	RCU_INIT_POINTER(entry->next, head);
 
 	rcu_assign_pointer(new_tbl->buckets[new_hash], entry);
 	spin_unlock(new_bucket_lock);



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] lib: fix data race in rhashtable_rehash_one
  2015-09-21 14:51   ` Eric Dumazet
@ 2015-09-21 15:10     ` Dmitry Vyukov
  2015-09-21 15:15       ` Eric Dumazet
  2015-09-21 22:25     ` Thomas Graf
  1 sibling, 1 reply; 10+ messages in thread
From: Dmitry Vyukov @ 2015-09-21 15:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: tgraf, netdev, LKML, Kostya Serebryany, Andrey Konovalov,
	Alexander Potapenko, ktsan, Paul McKenney

On Mon, Sep 21, 2015 at 4:51 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2015-09-21 at 06:31 -0700, Eric Dumazet wrote:
>> On Mon, 2015-09-21 at 10:08 +0200, Dmitry Vyukov wrote:
>> > rhashtable_rehash_one() uses plain writes to update entry->next,
>> > while it is being concurrently accessed by readers.
>> > Unfortunately, the compiler is within its rights to (for example) use
>> > byte-at-a-time writes to update the pointer, which would fatally confuse
>> > concurrent readers.
>> >
>> This is bogus.
>>
>> 1) Linux is certainly not working if some arch or compiler is not doing
>> single word writes. WRITE_ONCE() would not help at all to enforce this.
>>
>> 2) If  new node is not yet visible, we don't care if we write
>> entry->next using any kind of operation.
>>
>> So the WRITE_ONCE() is not needed at all.
>>
>>
>>
>> > +   WRITE_ONCE(entry->next, head);
>>
>>
>> The rcu_assign_pointer() immediately following is enough in this case.
>>
>> We have hundred of similar cases in the kernel.
>>
>>
>
> The changelog and comment are totally confusing.
>
> Please remove the bogus parts in them, and/or rephrase.
>
> The important part here is that we rehash an item, so we need to make
> sure to maintain consistent ->next field, and need to prevent compiler
> from using ->next as a temporary variable.
>
> ptr->next = 1UL | ((base + offset) << 1);
>
> Is dangerous because compiler could issue :
>
> ptr->next = (base + offset);
>
> ptr->next <<= 1;
>
> ptr->next += 1UL;
>
> Frankly, all this looks like an oversight in this code.
>
> Not sure why the NULLS value is even recomputed.

I have not looked in detail yet, but the NULLS recomputation uses
new_hash, which obviously wasn't available when the value was
previously computed. Don't know yet whether it is important or not.



>
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index cc0c69710dcf..0a29f07ba45a 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -187,10 +187,7 @@ static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash)
>         head = rht_dereference_bucket(new_tbl->buckets[new_hash],
>                                       new_tbl, new_hash);
>
> -       if (rht_is_a_nulls(head))
> -               INIT_RHT_NULLS_HEAD(entry->next, ht, new_hash);
> -       else
> -               RCU_INIT_POINTER(entry->next, head);
> +       RCU_INIT_POINTER(entry->next, head);
>
>         rcu_assign_pointer(new_tbl->buckets[new_hash], entry);
>         spin_unlock(new_bucket_lock);
>
>
> --
> You received this message because you are subscribed to the Google Groups "ktsan" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ktsan+unsubscribe@googlegroups.com.
> To post to this group, send email to ktsan@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ktsan/1442847108.29850.56.camel%40edumazet-glaptop2.roam.corp.google.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] lib: fix data race in rhashtable_rehash_one
  2015-09-21 15:10     ` Dmitry Vyukov
@ 2015-09-21 15:15       ` Eric Dumazet
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2015-09-21 15:15 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: tgraf, netdev, LKML, Kostya Serebryany, Andrey Konovalov,
	Alexander Potapenko, ktsan, Paul McKenney

On Mon, 2015-09-21 at 17:10 +0200, Dmitry Vyukov wrote:
> On Mon, Sep 21, 2015 at 4:51 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Mon, 2015-09-21 at 06:31 -0700, Eric Dumazet wrote:
> >> On Mon, 2015-09-21 at 10:08 +0200, Dmitry Vyukov wrote:
> >> > rhashtable_rehash_one() uses plain writes to update entry->next,
> >> > while it is being concurrently accessed by readers.
> >> > Unfortunately, the compiler is within its rights to (for example) use
> >> > byte-at-a-time writes to update the pointer, which would fatally confuse
> >> > concurrent readers.
> >> >
> >> This is bogus.
> >>
> >> 1) Linux is certainly not working if some arch or compiler is not doing
> >> single word writes. WRITE_ONCE() would not help at all to enforce this.
> >>
> >> 2) If  new node is not yet visible, we don't care if we write
> >> entry->next using any kind of operation.
> >>
> >> So the WRITE_ONCE() is not needed at all.
> >>
> >>
> >>
> >> > +   WRITE_ONCE(entry->next, head);
> >>
> >>
> >> The rcu_assign_pointer() immediately following is enough in this case.
> >>
> >> We have hundred of similar cases in the kernel.
> >>
> >>
> >
> > The changelog and comment are totally confusing.
> >
> > Please remove the bogus parts in them, and/or rephrase.
> >
> > The important part here is that we rehash an item, so we need to make
> > sure to maintain consistent ->next field, and need to prevent compiler
> > from using ->next as a temporary variable.
> >
> > ptr->next = 1UL | ((base + offset) << 1);
> >
> > Is dangerous because compiler could issue :
> >
> > ptr->next = (base + offset);
> >
> > ptr->next <<= 1;
> >
> > ptr->next += 1UL;
> >
> > Frankly, all this looks like an oversight in this code.
> >
> > Not sure why the NULLS value is even recomputed.
> 
> I have not looked in detail yet, but the NULLS recomputation uses
> new_hash, which obviously wasn't available when the value was
> previously computed. Don't know yet whether it is important or not.


Well, head already contains the right value, set in bucket_table_alloc()

for (i = 0; i < nbuckets; i++)
    INIT_RHT_NULLS_HEAD(tbl->buckets[i], ht, i);

Think of this nulls value as a special NULL pointer.

If hash table is properly allocated/initialized, all the chains are
correctly ending with a proper NULL pointer.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] lib: fix data race in rhashtable_rehash_one
  2015-09-21 14:51   ` Eric Dumazet
  2015-09-21 15:10     ` Dmitry Vyukov
@ 2015-09-21 22:25     ` Thomas Graf
  2015-09-21 23:03       ` Eric Dumazet
  1 sibling, 1 reply; 10+ messages in thread
From: Thomas Graf @ 2015-09-21 22:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Dmitry Vyukov, netdev, linux-kernel, kcc, andreyknvl, glider,
	ktsan, paulmck

On 09/21/15 at 07:51am, Eric Dumazet wrote:
> The important part here is that we rehash an item, so we need to make
> sure to maintain consistent ->next field, and need to prevent compiler
> from using ->next as a temporary variable.
> 
> ptr->next = 1UL | ((base + offset) << 1);
> 
> Is dangerous because compiler could issue :
> 
> ptr->next = (base + offset);
> 
> ptr->next <<= 1;
> 
> ptr->next += 1UL;
> 
> Frankly, all this looks like an oversight in this code.
> 
> Not sure why the NULLS value is even recomputed.

The hash of the chain is part of the NULLS value. Since the
entry might have been moved to a different chain, the NULLS
value must be recalculated to contain the proper hash.

However, nobody is using the hash today as far as I can
see so we could as well just remove it and use the base
value only for the nulls marker.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] lib: fix data race in rhashtable_rehash_one
  2015-09-21 22:25     ` Thomas Graf
@ 2015-09-21 23:03       ` Eric Dumazet
  2015-09-22  8:19         ` Thomas Graf
  2015-09-22 15:18         ` Herbert Xu
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2015-09-21 23:03 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Dmitry Vyukov, netdev, linux-kernel, kcc, andreyknvl, glider,
	ktsan, paulmck

On Tue, 2015-09-22 at 00:25 +0200, Thomas Graf wrote:
> On 09/21/15 at 07:51am, Eric Dumazet wrote:
> > The important part here is that we rehash an item, so we need to make
> > sure to maintain consistent ->next field, and need to prevent compiler
> > from using ->next as a temporary variable.
> > 
> > ptr->next = 1UL | ((base + offset) << 1);
> > 
> > Is dangerous because compiler could issue :
> > 
> > ptr->next = (base + offset);
> > 
> > ptr->next <<= 1;
> > 
> > ptr->next += 1UL;
> > 
> > Frankly, all this looks like an oversight in this code.
> > 
> > Not sure why the NULLS value is even recomputed.
> 
> The hash of the chain is part of the NULLS value. Since the
> entry might have been moved to a different chain, the NULLS
> value must be recalculated to contain the proper hash.
> 
> However, nobody is using the hash today as far as I can
> see so we could as well just remove it and use the base
> value only for the nulls marker.

What I said is :

In @head you already have the correct nulls value, from hash table.

You do not need to recompute this value, and/or test if hash table chain
is empty.

If hash bucket is empty, it contains the appropriate NULLS value.

If you are paranoiac add this debugging check :

if (rht_is_a_nulls(head))
    BUG_ON(head != (struct rhash_head *)rht_marker(ht, new_hash));


Therefore, simply fix the bug and unnecessary code with :

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index cc0c69710dcf..a54ff8949f91 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -187,10 +187,7 @@ static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash)
 	head = rht_dereference_bucket(new_tbl->buckets[new_hash],
 				      new_tbl, new_hash);
 
-	if (rht_is_a_nulls(head))
-		INIT_RHT_NULLS_HEAD(entry->next, ht, new_hash);
-	else
-		RCU_INIT_POINTER(entry->next, head);
+	RCU_INIT_POINTER(entry->next, head);
 
 	rcu_assign_pointer(new_tbl->buckets[new_hash], entry);
 	spin_unlock(new_bucket_lock);



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] lib: fix data race in rhashtable_rehash_one
  2015-09-21 23:03       ` Eric Dumazet
@ 2015-09-22  8:19         ` Thomas Graf
  2015-09-22  8:52           ` Dmitry Vyukov
  2015-09-22 15:18         ` Herbert Xu
  1 sibling, 1 reply; 10+ messages in thread
From: Thomas Graf @ 2015-09-22  8:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Dmitry Vyukov, netdev, linux-kernel, kcc, andreyknvl, glider,
	ktsan, paulmck

On 09/21/15 at 04:03pm, Eric Dumazet wrote:
> What I said is :
> 
> In @head you already have the correct nulls value, from hash table.
> 
> You do not need to recompute this value, and/or test if hash table chain
> is empty.
> 
> If hash bucket is empty, it contains the appropriate NULLS value.
> 
> If you are paranoiac add this debugging check :
> 
> if (rht_is_a_nulls(head))
>     BUG_ON(head != (struct rhash_head *)rht_marker(ht, new_hash));
> 
> 
> Therefore, simply fix the bug and unnecessary code with :

You are absolutely right Eric. Do you want to revise your patch Dmitry?
Eric's proposed fix absolutely the best way to fix this.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] lib: fix data race in rhashtable_rehash_one
  2015-09-22  8:19         ` Thomas Graf
@ 2015-09-22  8:52           ` Dmitry Vyukov
  0 siblings, 0 replies; 10+ messages in thread
From: Dmitry Vyukov @ 2015-09-22  8:52 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Eric Dumazet, netdev, LKML, Kostya Serebryany, Andrey Konovalov,
	Alexander Potapenko, ktsan, Paul McKenney

On Tue, Sep 22, 2015 at 10:19 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 09/21/15 at 04:03pm, Eric Dumazet wrote:
>> What I said is :
>>
>> In @head you already have the correct nulls value, from hash table.
>>
>> You do not need to recompute this value, and/or test if hash table chain
>> is empty.
>>
>> If hash bucket is empty, it contains the appropriate NULLS value.
>>
>> If you are paranoiac add this debugging check :
>>
>> if (rht_is_a_nulls(head))
>>     BUG_ON(head != (struct rhash_head *)rht_marker(ht, new_hash));
>>
>>
>> Therefore, simply fix the bug and unnecessary code with :
>
> You are absolutely right Eric. Do you want to revise your patch Dmitry?
> Eric's proposed fix absolutely the best way to fix this.

Mailed v2 of the patch.

-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] lib: fix data race in rhashtable_rehash_one
  2015-09-21 23:03       ` Eric Dumazet
  2015-09-22  8:19         ` Thomas Graf
@ 2015-09-22 15:18         ` Herbert Xu
  1 sibling, 0 replies; 10+ messages in thread
From: Herbert Xu @ 2015-09-22 15:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: tgraf, dvyukov, netdev, linux-kernel, kcc, andreyknvl, glider,
	ktsan, paulmck

Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> What I said is :
> 
> In @head you already have the correct nulls value, from hash table.
> 
> You do not need to recompute this value, and/or test if hash table chain
> is empty.
> 
> If hash bucket is empty, it contains the appropriate NULLS value.
> 
> If you are paranoiac add this debugging check :
> 
> if (rht_is_a_nulls(head))
>    BUG_ON(head != (struct rhash_head *)rht_marker(ht, new_hash));
> 
> 
> Therefore, simply fix the bug and unnecessary code with :

Ack.  I remember seeing this when I was working on it but never
got around to removing this bogosity.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-09-22 15:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-21  8:08 [PATCH] lib: fix data race in rhashtable_rehash_one Dmitry Vyukov
2015-09-21 13:31 ` Eric Dumazet
2015-09-21 14:51   ` Eric Dumazet
2015-09-21 15:10     ` Dmitry Vyukov
2015-09-21 15:15       ` Eric Dumazet
2015-09-21 22:25     ` Thomas Graf
2015-09-21 23:03       ` Eric Dumazet
2015-09-22  8:19         ` Thomas Graf
2015-09-22  8:52           ` Dmitry Vyukov
2015-09-22 15:18         ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).