All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
@ 2013-05-21  9:05 Roman Gushchin
  2013-05-21 10:40 ` David Laight
  2013-05-21 12:09 ` Paul E. McKenney
  0 siblings, 2 replies; 57+ messages in thread
From: Roman Gushchin @ 2013-05-21  9:05 UTC (permalink / raw)
  To: Dipankar Sarma, Paul E. McKenney
  Cc: zhmurov, linux-kernel, netdev, David S. Miller, Eric Dumazet,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy

Hi, all!

This is a fix for a problem described here:
https://lkml.org/lkml/2013/4/16/371 .
---

Some network functions (udp4_lib_lookup2(), for instance) use the
hlist_nulls_for_each_entry_rcu macro in a way that assumes restarting
of a loop. In this case, it is strictly necessary to reread the head->first
value from the memory before each scan.
Without additional hints, gcc caches this value in a register. In this case,
if a cached node is moved to another chain during the scan, we can loop
forever getting wrong nulls values and restarting the loop uninterruptedly.

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Reported-by: Boris Zhmurov <zhmurov@yandex-team.ru>
---
  include/linux/rculist_nulls.h | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..efd51bf 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -37,8 +37,9 @@ static inline void hlist_nulls_del_init_rcu(struct 
hlist_nulls_node *n)
  	}
  }

-#define hlist_nulls_first_rcu(head) \
-	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
+#define hlist_nulls_first_rcu(head)			\
+	(*((struct hlist_nulls_node __rcu __force **)	\
+	   &((volatile typeof(*head) *)head)->first))

  #define hlist_nulls_next_rcu(node) \
  	(*((struct hlist_nulls_node __rcu __force **)&(node)->next))
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* RE: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21  9:05 [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro Roman Gushchin
@ 2013-05-21 10:40 ` David Laight
  2013-05-21 11:55   ` Roman Gushchin
  2013-05-21 12:09 ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: David Laight @ 2013-05-21 10:40 UTC (permalink / raw)
  To: Roman Gushchin, Dipankar Sarma, Paul E. McKenney
  Cc: zhmurov, linux-kernel, netdev, David S. Miller, Eric Dumazet,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy

> Some network functions (udp4_lib_lookup2(), for instance) use the
> hlist_nulls_for_each_entry_rcu macro in a way that assumes restarting
> of a loop. In this case, it is strictly necessary to reread the head->first
> value from the memory before each scan.
> Without additional hints, gcc caches this value in a register. In this case,
> if a cached node is moved to another chain during the scan, we can loop
> forever getting wrong nulls values and restarting the loop uninterruptedly.

Hmmm.... if either inet_ehashfn() or next_pseudo_random32() is
called gcc must reread it anyway.
I'm surprised gcc is generating separate code for all the conditional
loop endings. So why is it caching head->first.
The 'list empty' might be short-circuited - but that would only
be relevant after a rescan.
I suspect something else is going on.

I'd also have thought that this code needs to scan the entire
hash list. If things are moved under its feet this won't happen.
If it can end up on a different list (because a node got moved)
it is also possible for a later node to move it back.
In that case it would end up on the correct list
...
> -#define hlist_nulls_first_rcu(head) \
> -	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> +#define hlist_nulls_first_rcu(head)			\
> +	(*((struct hlist_nulls_node __rcu __force **)	\
> +	   &((volatile typeof(*head) *)head)->first))

I'd have thought it would be better to change hlist_nulls_first_rcu().

	David




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21 10:40 ` David Laight
@ 2013-05-21 11:55   ` Roman Gushchin
  2013-05-21 13:42     ` David Laight
  0 siblings, 1 reply; 57+ messages in thread
From: Roman Gushchin @ 2013-05-21 11:55 UTC (permalink / raw)
  To: David Laight
  Cc: Dipankar Sarma, Paul E. McKenney, zhmurov, linux-kernel, netdev,
	David S. Miller, Eric Dumazet, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On 21.05.2013 14:40, David Laight wrote:
>> Some network functions (udp4_lib_lookup2(), for instance) use the
>> hlist_nulls_for_each_entry_rcu macro in a way that assumes restarting
>> of a loop. In this case, it is strictly necessary to reread the head->first
>> value from the memory before each scan.
>> Without additional hints, gcc caches this value in a register. In this case,
>> if a cached node is moved to another chain during the scan, we can loop
>> forever getting wrong nulls values and restarting the loop uninterruptedly.
>
> Hmmm.... if either inet_ehashfn() or next_pseudo_random32() is
> called gcc must reread it anyway.
> I'm surprised gcc is generating separate code for all the conditional
> loop endings. So why is it caching head->first.
> The 'list empty' might be short-circuited - but that would only
> be relevant after a rescan.
> I suspect something else is going on.

What do you mean?

> I'd also have thought that this code needs to scan the entire
> hash list. If things are moved under its feet this won't happen.
> If it can end up on a different list (because a node got moved)
> it is also possible for a later node to move it back.
> In that case it would end up on the correct list

Things are always moved to the head of the list, so, it's not a problem.

> ...
>> -#define hlist_nulls_first_rcu(head) \
>> -	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
>> +#define hlist_nulls_first_rcu(head)			\
>> +	(*((struct hlist_nulls_node __rcu __force **)	\
>> +	   &((volatile typeof(*head) *)head)->first))
>
> I'd have thought it would be better to change hlist_nulls_first_rcu().

It's exactly what I suggest. May be I miss something? Please, clarify.

Regards,
Roman

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21  9:05 [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro Roman Gushchin
  2013-05-21 10:40 ` David Laight
@ 2013-05-21 12:09 ` Paul E. McKenney
  2013-05-21 12:46   ` Roman Gushchin
                     ` (2 more replies)
  1 sibling, 3 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-05-21 12:09 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Dipankar Sarma, zhmurov, linux-kernel, netdev, David S. Miller,
	Eric Dumazet, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy

On Tue, May 21, 2013 at 01:05:48PM +0400, Roman Gushchin wrote:
> Hi, all!
> 
> This is a fix for a problem described here:
> https://lkml.org/lkml/2013/4/16/371 .
> ---
> 
> Some network functions (udp4_lib_lookup2(), for instance) use the
> hlist_nulls_for_each_entry_rcu macro in a way that assumes restarting
> of a loop. In this case, it is strictly necessary to reread the head->first
> value from the memory before each scan.
> Without additional hints, gcc caches this value in a register. In this case,
> if a cached node is moved to another chain during the scan, we can loop
> forever getting wrong nulls values and restarting the loop uninterruptedly.
> 
> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
> Reported-by: Boris Zhmurov <zhmurov@yandex-team.ru>
> ---
>  include/linux/rculist_nulls.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> index 2ae1371..efd51bf 100644
> --- a/include/linux/rculist_nulls.h
> +++ b/include/linux/rculist_nulls.h
> @@ -37,8 +37,9 @@ static inline void hlist_nulls_del_init_rcu(struct
> hlist_nulls_node *n)
>  	}
>  }
> 
> -#define hlist_nulls_first_rcu(head) \
> -	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> +#define hlist_nulls_first_rcu(head)			\
> +	(*((struct hlist_nulls_node __rcu __force **)	\
> +	   &((volatile typeof(*head) *)head)->first))

Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?

							Thanx, Paul

>  #define hlist_nulls_next_rcu(node) \
>  	(*((struct hlist_nulls_node __rcu __force **)&(node)->next))
> -- 
> 1.8.1.2
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21 12:09 ` Paul E. McKenney
@ 2013-05-21 12:46   ` Roman Gushchin
  2013-05-21 12:58     ` Paul E. McKenney
  2013-05-21 13:37   ` Eric Dumazet
  2013-05-21 13:44   ` Eric Dumazet
  2 siblings, 1 reply; 57+ messages in thread
From: Roman Gushchin @ 2013-05-21 12:46 UTC (permalink / raw)
  To: paulmck
  Cc: Dipankar Sarma, zhmurov, linux-kernel, netdev, David S. Miller,
	Eric Dumazet, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy

On 21.05.2013 16:09, Paul E. McKenney wrote:
> On Tue, May 21, 2013 at 01:05:48PM +0400, Roman Gushchin wrote:
>> Hi, all!
>>
>> This is a fix for a problem described here:
>> https://lkml.org/lkml/2013/4/16/371 .
>> ---
>>
>> Some network functions (udp4_lib_lookup2(), for instance) use the
>> hlist_nulls_for_each_entry_rcu macro in a way that assumes restarting
>> of a loop. In this case, it is strictly necessary to reread the head->first
>> value from the memory before each scan.
>> Without additional hints, gcc caches this value in a register. In this case,
>> if a cached node is moved to another chain during the scan, we can loop
>> forever getting wrong nulls values and restarting the loop uninterruptedly.
>>
>> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
>> Reported-by: Boris Zhmurov <zhmurov@yandex-team.ru>
>> ---
>>   include/linux/rculist_nulls.h | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
>> index 2ae1371..efd51bf 100644
>> --- a/include/linux/rculist_nulls.h
>> +++ b/include/linux/rculist_nulls.h
>> @@ -37,8 +37,9 @@ static inline void hlist_nulls_del_init_rcu(struct
>> hlist_nulls_node *n)
>>   	}
>>   }
>>
>> -#define hlist_nulls_first_rcu(head) \
>> -	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
>> +#define hlist_nulls_first_rcu(head)			\
>> +	(*((struct hlist_nulls_node __rcu __force **)	\
>> +	   &((volatile typeof(*head) *)head)->first))
>
> Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?

It will be nice, but will require to keep the old variant too (for using 
in hlist_nulls_add_head_rcu() as in rcu_assign_pointer() argument). Do 
you think, it's better?

Regards,
Roman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21 12:46   ` Roman Gushchin
@ 2013-05-21 12:58     ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-05-21 12:58 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Dipankar Sarma, zhmurov, linux-kernel, netdev, David S. Miller,
	Eric Dumazet, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy

On Tue, May 21, 2013 at 04:46:54PM +0400, Roman Gushchin wrote:
> On 21.05.2013 16:09, Paul E. McKenney wrote:
> >On Tue, May 21, 2013 at 01:05:48PM +0400, Roman Gushchin wrote:
> >>Hi, all!
> >>
> >>This is a fix for a problem described here:
> >>https://lkml.org/lkml/2013/4/16/371 .
> >>---
> >>
> >>Some network functions (udp4_lib_lookup2(), for instance) use the
> >>hlist_nulls_for_each_entry_rcu macro in a way that assumes restarting
> >>of a loop. In this case, it is strictly necessary to reread the head->first
> >>value from the memory before each scan.
> >>Without additional hints, gcc caches this value in a register. In this case,
> >>if a cached node is moved to another chain during the scan, we can loop
> >>forever getting wrong nulls values and restarting the loop uninterruptedly.
> >>
> >>Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
> >>Reported-by: Boris Zhmurov <zhmurov@yandex-team.ru>
> >>---
> >>  include/linux/rculist_nulls.h | 5 +++--
> >>  1 file changed, 3 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> >>index 2ae1371..efd51bf 100644
> >>--- a/include/linux/rculist_nulls.h
> >>+++ b/include/linux/rculist_nulls.h
> >>@@ -37,8 +37,9 @@ static inline void hlist_nulls_del_init_rcu(struct
> >>hlist_nulls_node *n)
> >>  	}
> >>  }
> >>
> >>-#define hlist_nulls_first_rcu(head) \
> >>-	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> >>+#define hlist_nulls_first_rcu(head)			\
> >>+	(*((struct hlist_nulls_node __rcu __force **)	\
> >>+	   &((volatile typeof(*head) *)head)->first))
> >
> >Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?
> 
> It will be nice, but will require to keep the old variant too (for
> using in hlist_nulls_add_head_rcu() as in rcu_assign_pointer()
> argument). Do you think, it's better?

Both ACCESS_ONCE() and rcu_dereference_raw() can be used by updaters
as well as readers, so yes, I do think that it is better.  Better to
keep the encapsulation rather than having to search for lots of volatile
casts should this idiom ever need to change.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21 12:09 ` Paul E. McKenney
  2013-05-21 12:46   ` Roman Gushchin
@ 2013-05-21 13:37   ` Eric Dumazet
  2013-05-21 13:44   ` Eric Dumazet
  2 siblings, 0 replies; 57+ messages in thread
From: Eric Dumazet @ 2013-05-21 13:37 UTC (permalink / raw)
  To: paulmck
  Cc: Roman Gushchin, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote:
> > 
> > -#define hlist_nulls_first_rcu(head) \
> > -	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> > +#define hlist_nulls_first_rcu(head)			\
> > +	(*((struct hlist_nulls_node __rcu __force **)	\
> > +	   &((volatile typeof(*head) *)head)->first))
> 
> Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?
> 

+1

Very good catch Roman !

Thanks !



^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21 11:55   ` Roman Gushchin
@ 2013-05-21 13:42     ` David Laight
  0 siblings, 0 replies; 57+ messages in thread
From: David Laight @ 2013-05-21 13:42 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Dipankar Sarma, Paul E. McKenney, zhmurov, linux-kernel, netdev,
	David S. Miller, Eric Dumazet, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

> >> -#define hlist_nulls_first_rcu(head) \
> >> -	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> >> +#define hlist_nulls_first_rcu(head)			\
> >> +	(*((struct hlist_nulls_node __rcu __force **)	\
> >> +	   &((volatile typeof(*head) *)head)->first))
> >
> > I'd have thought it would be better to change hlist_nulls_first_rcu().
> 
> It's exactly what I suggest. May be I miss something? Please, clarify.

Gah - chasing through too many defines in too many header files!
Actually making the 'head' structure have volatile pointers
might be better.
Possibly in struct hlist_nulls_node itself.
There are toooo many casts for sanity :-)

	David




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21 12:09 ` Paul E. McKenney
  2013-05-21 12:46   ` Roman Gushchin
  2013-05-21 13:37   ` Eric Dumazet
@ 2013-05-21 13:44   ` Eric Dumazet
  2013-05-21 14:47     ` Roman Gushchin
  2 siblings, 1 reply; 57+ messages in thread
From: Eric Dumazet @ 2013-05-21 13:44 UTC (permalink / raw)
  To: paulmck
  Cc: Roman Gushchin, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote:

> > 
> > -#define hlist_nulls_first_rcu(head) \
> > -	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> > +#define hlist_nulls_first_rcu(head)			\
> > +	(*((struct hlist_nulls_node __rcu __force **)	\
> > +	   &((volatile typeof(*head) *)head)->first))
> 
> Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?

More exactly we have :

#define list_entry_rcu(ptr, type, member) \
        ({typeof (*ptr) __rcu *__ptr = (typeof (*ptr) __rcu __force *)ptr; \
         container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
        })

#define list_for_each_entry_rcu(pos, head, member) \
        for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
                &pos->member != (head); \
                pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
<< and >>

#define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                 \
        for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
                (!is_a_nulls(pos)) &&                                           \
                ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
                pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))

We need to change hlist_nulls_for_each_entry_rcu() to use same construct,
so that the rcu_dereference_raw() is performed at the right place.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21 13:44   ` Eric Dumazet
@ 2013-05-21 14:47     ` Roman Gushchin
  2013-05-21 15:16       ` Eric Dumazet
  2013-05-21 15:38       ` Eric Dumazet
  0 siblings, 2 replies; 57+ messages in thread
From: Roman Gushchin @ 2013-05-21 14:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On 21.05.2013 17:44, Eric Dumazet wrote:
> On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote:
>
>>>
>>> -#define hlist_nulls_first_rcu(head) \
>>> -	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
>>> +#define hlist_nulls_first_rcu(head)			\
>>> +	(*((struct hlist_nulls_node __rcu __force **)	\
>>> +	   &((volatile typeof(*head) *)head)->first))
>>
>> Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?
>
> More exactly we have :
>
> #define list_entry_rcu(ptr, type, member) \
>          ({typeof (*ptr) __rcu *__ptr = (typeof (*ptr) __rcu __force *)ptr; \
>           container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
>          })
>
> #define list_for_each_entry_rcu(pos, head, member) \
>          for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
>                  &pos->member != (head); \
>                  pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
> << and >>
>
> #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                 \
>          for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
>                  (!is_a_nulls(pos)) &&                                           \
>                  ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
>                  pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
>
> We need to change hlist_nulls_for_each_entry_rcu() to use same construct,
> so that the rcu_dereference_raw() is performed at the right place.

No.

This code has the same mistake: it is rcu_dereference_raw(head->first), 
so there is nothing that prevents gcc to store the (head->first) value 
in a register.

Let's see on assembler (3.4.46, net/ipv4/udp.c, udp4_lib_lookup2()):

(0): here we get head address from stack
(1): here we get head->first
(2): this is a place, where we jump (by mistake) in original version 
from (3), (4) and (5). But in (4) we make same as in (1), so it's not a 
problem. In patched version we always jump to (1).

1) original version:
0000000000002840 <udp4_lib_lookup2.isra.35>:
     2840:	41 57                	push   %r15
     2842:	41 89 d7             	mov    %edx,%r15d
     2845:	41 56                	push   %r14
     2847:	41 55                	push   %r13
     2849:	41 54                	push   %r12
     284b:	55                   	push   %rbp
     284c:	48 89 fd             	mov    %rdi,%rbp
     284f:	53                   	push   %rbx
     2850:	48 83 ec 28          	sub    $0x28,%rsp
(0) 2854:	48 8b 44 24 60       	mov    0x60(%rsp),%rax
     2859:	44 8b 64 24 68       	mov    0x68(%rsp),%r12d
(1) 285e:	4c 8b 28             	mov    (%rax),%r13
(2) 2861:	31 ff                	xor    %edi,%edi
     2863:	41 f6 c5 01          	test   $0x1,%r13b
     2867:	4d 89 ea             	mov    %r13,%r10
     286a:	bb ff ff ff ff       	mov    $0xffffffff,%ebx
     286f:	74 32                	je     28a3 <udp4_lib_lookup2.isra.35+0x63>
     2871:	e9 20 02 00 00       	jmpq   2a96 
<udp4_lib_lookup2.isra.35+0x256>
     ...
     2940:	49 d1 ea             	shr    %r10
     2943:	4d 39 e2             	cmp    %r12,%r10
(3) 2946:	0f 85 15 ff ff ff    	jne    2861 <udp4_lib_lookup2.isra.35+0x21>
     294c:	48 85 ff             	test   %rdi,%rdi
     294f:	74 27                	je     2978 
<udp4_lib_lookup2.isra.35+0x138>
     2951:	41 89 db             	mov    %ebx,%r11d
     2954:	48 8d 5f 4c          	lea    0x4c(%rdi),%rbx
     2958:	41 ba 02 00 00 00    	mov    $0x2,%r10d
     295e:	66 90                	xchg   %ax,%ax
     2960:	45 8d 6a 01          	lea    0x1(%r10),%r13d
     2964:	44 89 d0             	mov    %r10d,%eax
     2967:	f0 44 0f b1 2b       	lock cmpxchg %r13d,(%rbx)
     296c:	41 39 c2             	cmp    %eax,%r10d
     296f:	74 36                	je     29a7 
<udp4_lib_lookup2.isra.35+0x167>
     2971:	85 c0                	test   %eax,%eax
     2973:	41 89 c2             	mov    %eax,%r10d
     2976:	75 e8                	jne    2960 
<udp4_lib_lookup2.isra.35+0x120>
     2978:	31 ff                	xor    %edi,%edi
     297a:	48 83 c4 28          	add    $0x28,%rsp
     297e:	48 89 f8             	mov    %rdi,%rax
     2981:	5b                   	pop    %rbx
     2982:	5d                   	pop    %rbp
     2983:	41 5c                	pop    %r12
     2985:	41 5d                	pop    %r13
     2987:	41 5e                	pop    %r14
     2989:	41 5f                	pop    %r15
     298b:	c3                   	retq
     298c:	0f 1f 40 00          	nopl   0x0(%rax)
     2990:	4d 8b b2 58 02 00 00 	mov    0x258(%r10),%r14
     2997:	41 f6 46 6e 10       	testb  $0x10,0x6e(%r14)
     299c:	0f 85 de fe ff ff    	jne    2880 <udp4_lib_lookup2.isra.35+0x40>
     29a2:	e9 17 ff ff ff       	jmpq   28be <udp4_lib_lookup2.isra.35+0x7e>
     29a7:	48 3b 6f 30          	cmp    0x30(%rdi),%rbp
     29ab:	74 43                	je     29f0 
<udp4_lib_lookup2.isra.35+0x1b0>
     29ad:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
     29b2:	41 39 c3             	cmp    %eax,%r11d
     29b5:	7e c3                	jle    297a 
<udp4_lib_lookup2.isra.35+0x13a>
     29b7:	89 4c 24 10          	mov    %ecx,0x10(%rsp)
     29bb:	89 74 24 18          	mov    %esi,0x18(%rsp)
     29bf:	44 89 44 24 08       	mov    %r8d,0x8(%rsp)
     29c4:	44 89 0c 24          	mov    %r9d,(%rsp)
     29c8:	e8 c3 dc ff ff       	callq  690 <sock_put>
     29cd:	48 8b 44 24 60       	mov    0x60(%rsp),%rax
     29d2:	8b 4c 24 10          	mov    0x10(%rsp),%ecx
     29d6:	8b 74 24 18          	mov    0x18(%rsp),%esi
     29da:	44 8b 44 24 08       	mov    0x8(%rsp),%r8d
     29df:	44 8b 0c 24          	mov    (%rsp),%r9d
     29e3:	4c 8b 28             	mov    (%rax),%r13
(4) 29e6:	e9 76 fe ff ff       	jmpq   2861 <udp4_lib_lookup2.isra.35+0x21>
     29eb:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
     29f0:	44 0f b7 57 0c       	movzwl 0xc(%rdi),%r10d
     29f5:	66 41 83 fa 0a       	cmp    $0xa,%r10w
     29fa:	74 7f                	je     2a7b 
<udp4_lib_lookup2.isra.35+0x23b>
     29fc:	3b 4f 04             	cmp    0x4(%rdi),%ecx
     29ff:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
     2a04:	75 ac                	jne    29b2 
<udp4_lib_lookup2.isra.35+0x172>
     2a06:	0f b7 9f 7a 02 00 00 	movzwl 0x27a(%rdi),%ebx
     2a0d:	41 39 d8             	cmp    %ebx,%r8d
     2a10:	75 a0                	jne    29b2 
<udp4_lib_lookup2.isra.35+0x172>
     2a12:	8b 1f                	mov    (%rdi),%ebx
     2a14:	66 41 83 fa 02       	cmp    $0x2,%r10w
     2a19:	41 0f 94 c2          	sete   %r10b
     2a1d:	45 0f b6 d2          	movzbl %r10b,%r10d
     2a21:	85 db                	test   %ebx,%ebx
     2a23:	44 89 d0             	mov    %r10d,%eax
     2a26:	74 0d                	je     2a35 
<udp4_lib_lookup2.isra.35+0x1f5>
     2a28:	39 de                	cmp    %ebx,%esi
     2a2a:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
     2a2f:	75 81                	jne    29b2 
<udp4_lib_lookup2.isra.35+0x172>
     2a31:	41 8d 42 02          	lea    0x2(%r10),%eax
     2a35:	44 0f b7 97 78 02 00 	movzwl 0x278(%rdi),%r10d
     2a3c:	00
     2a3d:	66 45 85 d2          	test   %r10w,%r10w
     2a41:	74 0d                	je     2a50 
<udp4_lib_lookup2.isra.35+0x210>
     2a43:	66 45 39 d7          	cmp    %r10w,%r15w
     2a47:	0f 85 60 ff ff ff    	jne    29ad 
<udp4_lib_lookup2.isra.35+0x16d>
     2a4d:	83 c0 02             	add    $0x2,%eax
     2a50:	44 8b 57 10          	mov    0x10(%rdi),%r10d
     2a54:	45 85 d2             	test   %r10d,%r10d
     2a57:	0f 84 55 ff ff ff    	je     29b2 
<udp4_lib_lookup2.isra.35+0x172>
     2a5d:	8d 58 02             	lea    0x2(%rax),%ebx
     2a60:	45 39 d1             	cmp    %r10d,%r9d
     2a63:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
     2a68:	0f 44 c3             	cmove  %ebx,%eax
     2a6b:	e9 42 ff ff ff       	jmpq   29b2 
<udp4_lib_lookup2.isra.35+0x172>
     2a70:	41 bb ff ff ff ff    	mov    $0xffffffff,%r11d
     2a76:	e9 05 fe ff ff       	jmpq   2880 <udp4_lib_lookup2.isra.35+0x40>
     2a7b:	48 8b 9f 70 02 00 00 	mov    0x270(%rdi),%rbx
     2a82:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
     2a87:	f6 43 6e 10          	testb  $0x10,0x6e(%rbx)
     2a8b:	0f 85 21 ff ff ff    	jne    29b2 
<udp4_lib_lookup2.isra.35+0x172>
     2a91:	e9 66 ff ff ff       	jmpq   29fc 
<udp4_lib_lookup2.isra.35+0x1bc>
     2a96:	4c 89 e8             	mov    %r13,%rax
     2a99:	48 d1 e8             	shr    %rax
     2a9c:	4c 39 e0             	cmp    %r12,%rax
(5) 2a9f:	0f 85 bc fd ff ff    	jne    2861 <udp4_lib_lookup2.isra.35+0x21>
     2aa5:	e9 ce fe ff ff       	jmpq   2978 
<udp4_lib_lookup2.isra.35+0x138>
     2aaa:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)

2) patched version:
0000000000002840 <udp4_lib_lookup2>:
     2840:	41 57                	push   %r15
     2842:	41 89 d7             	mov    %edx,%r15d
     2845:	41 56                	push   %r14
     2847:	41 55                	push   %r13
     2849:	41 54                	push   %r12
     284b:	55                   	push   %rbp
     284c:	48 89 fd             	mov    %rdi,%rbp
     284f:	53                   	push   %rbx
     2850:	48 83 ec 28          	sub    $0x28,%rsp
     2854:	44 8b 6c 24 68       	mov    0x68(%rsp),%r13d
(0) 2859:	4c 8b 64 24 60       	mov    0x60(%rsp),%r12
(1) 285e:	4d 8b 14 24          	mov    (%r12),%r10
(2) 2862:	31 ff                	xor    %edi,%edi
     2864:	bb ff ff ff ff       	mov    $0xffffffff,%ebx
     2869:	41 f6 c2 01          	test   $0x1,%r10b
     286d:	74 2c                	je     289b <udp4_lib_lookup2+0x5b>
     286f:	e9 0a 02 00 00       	jmpq   2a7e <udp4_lib_lookup2+0x23e>
     ...
     2930:	49 d1 ea             	shr    %r10
     2933:	4d 39 d5             	cmp    %r10,%r13
(3) 2936:	0f 85 22 ff ff ff    	jne    285e <udp4_lib_lookup2+0x1e>
     293c:	48 85 ff             	test   %rdi,%rdi
     293f:	74 27                	je     2968 <udp4_lib_lookup2+0x128>
     2941:	41 89 db             	mov    %ebx,%r11d
     2944:	48 8d 5f 4c          	lea    0x4c(%rdi),%rbx
     2948:	41 ba 02 00 00 00    	mov    $0x2,%r10d
     294e:	66 90                	xchg   %ax,%ax
     2950:	45 8d 72 01          	lea    0x1(%r10),%r14d
     2954:	44 89 d0             	mov    %r10d,%eax
     2957:	f0 44 0f b1 33       	lock cmpxchg %r14d,(%rbx)
     295c:	41 39 c2             	cmp    %eax,%r10d
     295f:	74 36                	je     2997 <udp4_lib_lookup2+0x157>
     2961:	85 c0                	test   %eax,%eax
     2963:	41 89 c2             	mov    %eax,%r10d
     2966:	75 e8                	jne    2950 <udp4_lib_lookup2+0x110>
     2968:	31 ff                	xor    %edi,%edi
     296a:	48 83 c4 28          	add    $0x28,%rsp
     296e:	48 89 f8             	mov    %rdi,%rax
     2971:	5b                   	pop    %rbx
     2972:	5d                   	pop    %rbp
     2973:	41 5c                	pop    %r12
     2975:	41 5d                	pop    %r13
     2977:	41 5e                	pop    %r14
     2979:	41 5f                	pop    %r15
     297b:	c3                   	retq
     297c:	0f 1f 40 00          	nopl   0x0(%rax)
     2980:	4d 8b b2 58 02 00 00 	mov    0x258(%r10),%r14
     2987:	41 f6 46 6e 10       	testb  $0x10,0x6e(%r14)
     298c:	0f 85 e6 fe ff ff    	jne    2878 <udp4_lib_lookup2+0x38>
     2992:	e9 1f ff ff ff       	jmpq   28b6 <udp4_lib_lookup2+0x76>
     2997:	48 3b 6f 30          	cmp    0x30(%rdi),%rbp
     299b:	74 3b                	je     29d8 <udp4_lib_lookup2+0x198>
     299d:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
     29a2:	41 39 c3             	cmp    %eax,%r11d
     29a5:	7e c3                	jle    296a <udp4_lib_lookup2+0x12a>
     29a7:	89 4c 24 10          	mov    %ecx,0x10(%rsp)
     29ab:	89 74 24 18          	mov    %esi,0x18(%rsp)
     29af:	44 89 44 24 08       	mov    %r8d,0x8(%rsp)
     29b4:	44 89 0c 24          	mov    %r9d,(%rsp)
     29b8:	e8 d3 dc ff ff       	callq  690 <sock_put>
     29bd:	8b 4c 24 10          	mov    0x10(%rsp),%ecx
     29c1:	8b 74 24 18          	mov    0x18(%rsp),%esi
     29c5:	44 8b 44 24 08       	mov    0x8(%rsp),%r8d
     29ca:	44 8b 0c 24          	mov    (%rsp),%r9d
(4) 29ce:	e9 8b fe ff ff       	jmpq   285e <udp4_lib_lookup2+0x1e>

     29d3:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
     29d8:	44 0f b7 57 0c       	movzwl 0xc(%rdi),%r10d
     29dd:	66 41 83 fa 0a       	cmp    $0xa,%r10w
     29e2:	74 7f                	je     2a63 <udp4_lib_lookup2+0x223>
     29e4:	3b 4f 04             	cmp    0x4(%rdi),%ecx
     29e7:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
     29ec:	75 b4                	jne    29a2 <udp4_lib_lookup2+0x162>
     29ee:	0f b7 9f 7a 02 00 00 	movzwl 0x27a(%rdi),%ebx
     29f5:	41 39 d8             	cmp    %ebx,%r8d
     29f8:	75 a8                	jne    29a2 <udp4_lib_lookup2+0x162>
     29fa:	8b 1f                	mov    (%rdi),%ebx
     29fc:	66 41 83 fa 02       	cmp    $0x2,%r10w
     2a01:	41 0f 94 c2          	sete   %r10b
     2a05:	45 0f b6 d2          	movzbl %r10b,%r10d
     2a09:	85 db                	test   %ebx,%ebx
     2a0b:	44 89 d0             	mov    %r10d,%eax
     2a0e:	74 0d                	je     2a1d <udp4_lib_lookup2+0x1dd>
     2a10:	39 de                	cmp    %ebx,%esi
     2a12:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
     2a17:	75 89                	jne    29a2 <udp4_lib_lookup2+0x162>
     2a19:	41 8d 42 02          	lea    0x2(%r10),%eax
     2a1d:	44 0f b7 97 78 02 00 	movzwl 0x278(%rdi),%r10d
     2a24:	00
     2a25:	66 45 85 d2          	test   %r10w,%r10w
     2a29:	74 0d                	je     2a38 <udp4_lib_lookup2+0x1f8>
     2a2b:	66 45 39 d7          	cmp    %r10w,%r15w
     2a2f:	0f 85 68 ff ff ff    	jne    299d <udp4_lib_lookup2+0x15d>
     2a35:	83 c0 02             	add    $0x2,%eax
     2a38:	44 8b 57 10          	mov    0x10(%rdi),%r10d
     2a3c:	45 85 d2             	test   %r10d,%r10d
     2a3f:	0f 84 5d ff ff ff    	je     29a2 <udp4_lib_lookup2+0x162>
     2a45:	8d 58 02             	lea    0x2(%rax),%ebx
     2a48:	45 39 d1             	cmp    %r10d,%r9d
     2a4b:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
     2a50:	0f 44 c3             	cmove  %ebx,%eax
     2a53:	e9 4a ff ff ff       	jmpq   29a2 <udp4_lib_lookup2+0x162>
     2a58:	41 bb ff ff ff ff    	mov    $0xffffffff,%r11d
     2a5e:	e9 15 fe ff ff       	jmpq   2878 <udp4_lib_lookup2+0x38>
     2a63:	48 8b 9f 70 02 00 00 	mov    0x270(%rdi),%rbx
     2a6a:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
     2a6f:	f6 43 6e 10          	testb  $0x10,0x6e(%rbx)
     2a73:	0f 85 29 ff ff ff    	jne    29a2 <udp4_lib_lookup2+0x162>
     2a79:	e9 66 ff ff ff       	jmpq   29e4 <udp4_lib_lookup2+0x1a4>
     2a7e:	49 d1 ea             	shr    %r10
     2a81:	4d 39 d5             	cmp    %r10,%r13
(5) 2a84:	0f 85 d4 fd ff ff    	jne    285e <udp4_lib_lookup2+0x1e>
     2a8a:	e9 d9 fe ff ff       	jmpq   2968 <udp4_lib_lookup2+0x128>
     2a8f:	90                   	nop

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21 14:47     ` Roman Gushchin
@ 2013-05-21 15:16       ` Eric Dumazet
  2013-05-21 15:51         ` Roman Gushchin
  2013-05-21 15:38       ` Eric Dumazet
  1 sibling, 1 reply; 57+ messages in thread
From: Eric Dumazet @ 2013-05-21 15:16 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Tue, 2013-05-21 at 18:47 +0400, Roman Gushchin wrote:
> On 21.05.2013 17:44, Eric Dumazet wrote:
> > On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote:
> >
> >>>
> >>> -#define hlist_nulls_first_rcu(head) \
> >>> -	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> >>> +#define hlist_nulls_first_rcu(head)			\
> >>> +	(*((struct hlist_nulls_node __rcu __force **)	\
> >>> +	   &((volatile typeof(*head) *)head)->first))
> >>
> >> Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?
> >
> > More exactly we have :
> >
> > #define list_entry_rcu(ptr, type, member) \
> >          ({typeof (*ptr) __rcu *__ptr = (typeof (*ptr) __rcu __force *)ptr; \
> >           container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
> >          })
> >
> > #define list_for_each_entry_rcu(pos, head, member) \
> >          for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> >                  &pos->member != (head); \
> >                  pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
> > << and >>
> >
> > #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                 \
> >          for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
> >                  (!is_a_nulls(pos)) &&                                           \
> >                  ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
> >                  pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
> >
> > We need to change hlist_nulls_for_each_entry_rcu() to use same construct,
> > so that the rcu_dereference_raw() is performed at the right place.
> 
> No.
> 
> This code has the same mistake: it is rcu_dereference_raw(head->first), 
> so there is nothing that prevents gcc to store the (head->first) value 
> in a register.

Please read again what I wrote, you misundertood.

hlist_nulls_for_each_entry_rcu() should use same construct than
list_for_each_entry_rcu(), and not use rcu_dereference_raw()

Is that clear, or do you want me to send the patch ?








^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21 14:47     ` Roman Gushchin
  2013-05-21 15:16       ` Eric Dumazet
@ 2013-05-21 15:38       ` Eric Dumazet
  2013-05-21 15:51         ` Roman Gushchin
  2013-05-21 18:12         ` [PATCH v2] " Roman Gushchin
  1 sibling, 2 replies; 57+ messages in thread
From: Eric Dumazet @ 2013-05-21 15:38 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Tue, 2013-05-21 at 18:47 +0400, Roman Gushchin wrote:

> This code has the same mistake: it is rcu_dereference_raw(head->first), 
> so there is nothing that prevents gcc to store the (head->first) value 
> in a register.

If other rcu accessors have the same problem, a more complete patch is
needed.

Thanks




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21 15:16       ` Eric Dumazet
@ 2013-05-21 15:51         ` Roman Gushchin
  0 siblings, 0 replies; 57+ messages in thread
From: Roman Gushchin @ 2013-05-21 15:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On 21.05.2013 19:16, Eric Dumazet wrote:
> On Tue, 2013-05-21 at 18:47 +0400, Roman Gushchin wrote:
>> On 21.05.2013 17:44, Eric Dumazet wrote:
>>> On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote:
>>>
>>>>>
>>>>> -#define hlist_nulls_first_rcu(head) \
>>>>> -	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
>>>>> +#define hlist_nulls_first_rcu(head)			\
>>>>> +	(*((struct hlist_nulls_node __rcu __force **)	\
>>>>> +	   &((volatile typeof(*head) *)head)->first))
>>>>
>>>> Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?
>>>
>>> More exactly we have :
>>>
>>> #define list_entry_rcu(ptr, type, member) \
>>>           ({typeof (*ptr) __rcu *__ptr = (typeof (*ptr) __rcu __force *)ptr; \
>>>            container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
>>>           })
>>>
>>> #define list_for_each_entry_rcu(pos, head, member) \
>>>           for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
>>>                   &pos->member != (head); \
>>>                   pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>>> << and >>
>>>
>>> #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                 \
>>>           for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
>>>                   (!is_a_nulls(pos)) &&                                           \
>>>                   ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
>>>                   pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
>>>
>>> We need to change hlist_nulls_for_each_entry_rcu() to use same construct,
>>> so that the rcu_dereference_raw() is performed at the right place.
>>
>> No.
>>
>> This code has the same mistake: it is rcu_dereference_raw(head->first),
>> so there is nothing that prevents gcc to store the (head->first) value
>> in a register.
>
> Please read again what I wrote, you misundertood.
>
> hlist_nulls_for_each_entry_rcu() should use same construct than
> list_for_each_entry_rcu(), and not use rcu_dereference_raw()
>
> Is that clear, or do you want me to send the patch ?

If you think, that it will solve the problem, please, send a patch. I 
think, you are wrong here.
If you think only that it will look better, I agree.

Regards,
Roman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21 15:38       ` Eric Dumazet
@ 2013-05-21 15:51         ` Roman Gushchin
  2013-05-21 18:12         ` [PATCH v2] " Roman Gushchin
  1 sibling, 0 replies; 57+ messages in thread
From: Roman Gushchin @ 2013-05-21 15:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On 21.05.2013 19:38, Eric Dumazet wrote:
> On Tue, 2013-05-21 at 18:47 +0400, Roman Gushchin wrote:
>
>> This code has the same mistake: it is rcu_dereference_raw(head->first),
>> so there is nothing that prevents gcc to store the (head->first) value
>> in a register.
>
> If other rcu accessors have the same problem, a more complete patch is
> needed.

Ok, I'll try to provide a complete patch ASAP.

>
> Thanks
>
>
>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21 15:38       ` Eric Dumazet
  2013-05-21 15:51         ` Roman Gushchin
@ 2013-05-21 18:12         ` Roman Gushchin
  2013-05-22  2:01           ` Eric Dumazet
  1 sibling, 1 reply; 57+ messages in thread
From: Roman Gushchin @ 2013-05-21 18:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

> If other rcu accessors have the same problem, a more complete patch is
> needed.

[PATCH] rcu: fix a race in rcu lists traverse macros

Some network functions (udp4_lib_lookup2(), for instance) use the
rcu lists traverse macros (hlist_nulls_for_each_entry_rcu, for instance)
in a way that assumes restarting of a loop. In this case, it is strictly
necessary to reread the head->first value from the memory before each scan.
Without additional hints, gcc caches this value in a register. In this case,
if a cached node is moved to another chain during the scan, we can loop
forever getting wrong nulls values and restarting the loop uninterruptedly.

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Reported-by: Boris Zhmurov <zhmurov@yandex-team.ru>
---
  include/linux/compiler.h      | 6 ++++++
  include/linux/rculist.h       | 6 ++++--
  include/linux/rculist_nulls.h | 3 ++-
  3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 92669cd..0e05d7c 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -351,6 +351,12 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
   */
  #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

+/*
+ * Prevent the compiler from optimizing accesses to structure member.
+ */
+#define GET_STRUCT_FIELD_VOLATILE(struct_ptr, field)		\
+	(((volatile typeof(*struct_ptr) *)struct_ptr)->field)
+
  /* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
  #ifdef CONFIG_KPROBES
  # define __kprobes	__attribute__((__section__(".kprobes.text")))
diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 8089e35..6c3eea7 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -282,7 +282,8 @@ static inline void list_splice_init_rcu(struct list_head *list,
   * as long as the traversal is guarded by rcu_read_lock().
   */
  #define list_for_each_entry_rcu(pos, head, member) \
-	for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
+	for (pos = list_entry_rcu(GET_STRUCT_FIELD_VOLATILE(head, next), \
+				  typeof(*pos), member); \
  		&pos->member != (head); \
  		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))

@@ -348,7 +349,8 @@ static inline void hlist_replace_rcu(struct hlist_node *old,
  /*
   * return the first or the next element in an RCU protected hlist
   */
-#define hlist_first_rcu(head)	(*((struct hlist_node __rcu **)(&(head)->first)))
+#define hlist_first_rcu(head)	(*((struct hlist_node __rcu **)		\
+				   (&GET_STRUCT_FIELD_VOLATILE(head, first))))
  #define hlist_next_rcu(node)	(*((struct hlist_node __rcu **)(&(node)->next)))
  #define hlist_pprev_rcu(node)	(*((struct hlist_node __rcu **)((node)->pprev)))

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..b04fd0a 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -38,7 +38,8 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
  }

  #define hlist_nulls_first_rcu(head) \
-	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
+	(*((struct hlist_nulls_node __rcu __force **)	\
+	   &(GET_STRUCT_FIELD_VOLATILE(head, first))))

  #define hlist_nulls_next_rcu(node) \
  	(*((struct hlist_nulls_node __rcu __force **)&(node)->next))
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-21 18:12         ` [PATCH v2] " Roman Gushchin
@ 2013-05-22  2:01           ` Eric Dumazet
  2013-05-22  5:49             ` Eric Dumazet
  2013-05-22  9:58             ` Paul E. McKenney
  0 siblings, 2 replies; 57+ messages in thread
From: Eric Dumazet @ 2013-05-22  2:01 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Tue, 2013-05-21 at 22:12 +0400, Roman Gushchin wrote:
> > If other rcu accessors have the same problem, a more complete patch is
> > needed.
> 
> [PATCH] rcu: fix a race in rcu lists traverse macros
> 
> Some network functions (udp4_lib_lookup2(), for instance) use the
> rcu lists traverse macros (hlist_nulls_for_each_entry_rcu, for instance)
> in a way that assumes restarting of a loop. In this case, it is strictly
> necessary to reread the head->first value from the memory before each scan.
> Without additional hints, gcc caches this value in a register. In this case,
> if a cached node is moved to another chain during the scan, we can loop
> forever getting wrong nulls values and restarting the loop uninterruptedly.
> 
> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
> Reported-by: Boris Zhmurov <zhmurov@yandex-team.ru>
> ---
>   include/linux/compiler.h      | 6 ++++++
>   include/linux/rculist.h       | 6 ++++--
>   include/linux/rculist_nulls.h | 3 ++-
>   3 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 92669cd..0e05d7c 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -351,6 +351,12 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
>    */
>   #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> 
> +/*
> + * Prevent the compiler from optimizing accesses to structure member.
> + */
> +#define GET_STRUCT_FIELD_VOLATILE(struct_ptr, field)		\
> +	(((volatile typeof(*struct_ptr) *)struct_ptr)->field)
> +
>   /* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
>   #ifdef CONFIG_KPROBES
>   # define __kprobes	__attribute__((__section__(".kprobes.text")))
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 8089e35..6c3eea7 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -282,7 +282,8 @@ static inline void list_splice_init_rcu(struct list_head *list,
>    * as long as the traversal is guarded by rcu_read_lock().
>    */
>   #define list_for_each_entry_rcu(pos, head, member) \
> -	for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> +	for (pos = list_entry_rcu(GET_STRUCT_FIELD_VOLATILE(head, next), \
> +				  typeof(*pos), member); \
>   		&pos->member != (head); \
>   		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
> 
> @@ -348,7 +349,8 @@ static inline void hlist_replace_rcu(struct hlist_node *old,
>   /*
>    * return the first or the next element in an RCU protected hlist
>    */
> -#define hlist_first_rcu(head)	(*((struct hlist_node __rcu **)(&(head)->first)))
> +#define hlist_first_rcu(head)	(*((struct hlist_node __rcu **)		\
> +				   (&GET_STRUCT_FIELD_VOLATILE(head, first))))
>   #define hlist_next_rcu(node)	(*((struct hlist_node __rcu **)(&(node)->next)))
>   #define hlist_pprev_rcu(node)	(*((struct hlist_node __rcu **)((node)->pprev)))
> 
> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> index 2ae1371..b04fd0a 100644
> --- a/include/linux/rculist_nulls.h
> +++ b/include/linux/rculist_nulls.h
> @@ -38,7 +38,8 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
>   }
> 
>   #define hlist_nulls_first_rcu(head) \
> -	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> +	(*((struct hlist_nulls_node __rcu __force **)	\
> +	   &(GET_STRUCT_FIELD_VOLATILE(head, first))))
> 
>   #define hlist_nulls_next_rcu(node) \
>   	(*((struct hlist_nulls_node __rcu __force **)&(node)->next))

Please use ACCESS_ONCE(), which is the standard way to deal with this,
and remove the rcu_dereference_raw() in 
hlist_nulls_for_each_entry_rcu()

something like : (for the nulls part only)

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..1d485d3 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -40,6 +40,9 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
 #define hlist_nulls_first_rcu(head) \
 	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
 
+#define hlist_nulls_first_rcu_once(head) \
+	ACCESS_ONCE(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
+
 #define hlist_nulls_next_rcu(node) \
 	(*((struct hlist_nulls_node __rcu __force **)&(node)->next))
 
@@ -107,7 +110,7 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
  *
  */
 #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                        \
-	for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
+	for (pos = hlist_nulls_first_rcu_once(head);                            \
 		(!is_a_nulls(pos)) &&                                           \
 		({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
                pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))




^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22  2:01           ` Eric Dumazet
@ 2013-05-22  5:49             ` Eric Dumazet
  2013-05-22 11:58               ` Roman Gushchin
  2013-05-22  9:58             ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Eric Dumazet @ 2013-05-22  5:49 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Tue, 2013-05-21 at 19:01 -0700, Eric Dumazet wrote:

> Please use ACCESS_ONCE(), which is the standard way to deal with this,
> and remove the rcu_dereference_raw() in 
> hlist_nulls_for_each_entry_rcu()
> 
> something like : (for the nulls part only)

Thinking a bit more about this, I am a bit worried by other uses of
ACCESS_ONCE(ptr->field)

rcu_dereference(ptr->field) intent is to reload ptr->field every time
from memory.

Do we really need a

#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)

#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) 

and change all rcu_dererence(ptr->field) occurrences ?

I probably miss something obvious.

Anyway, following patch seems to also solve the problem, I need a sleep to understand why.


diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 0bf5d39..10b5c54 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -410,7 +410,7 @@ static inline int compute_score2(struct sock *sk, struct net *net,
 static struct sock *udp4_lib_lookup2(struct net *net,
 		__be32 saddr, __be16 sport,
 		__be32 daddr, unsigned int hnum, int dif,
-		struct udp_hslot *hslot2, unsigned int slot2)
+		struct hlist_nulls_head *head, unsigned int slot2)
 {
 	struct sock *sk, *result;
 	struct hlist_nulls_node *node;
@@ -420,7 +420,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 begin:
 	result = NULL;
 	badness = 0;
-	udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
+	udp_portaddr_for_each_entry_rcu(sk, node, head) {
 		score = compute_score2(sk, net, saddr, sport,
 				      daddr, hnum, dif);
 		if (score > badness) {
@@ -483,7 +483,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 
 		result = udp4_lib_lookup2(net, saddr, sport,
 					  daddr, hnum, dif,
-					  hslot2, slot2);
+					  &hslot2->head, slot2);
 		if (!result) {
 			hash2 = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
 			slot2 = hash2 & udptable->mask;
@@ -493,7 +493,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 
 			result = udp4_lib_lookup2(net, saddr, sport,
 						  htonl(INADDR_ANY), hnum, dif,
-						  hslot2, slot2);
+						  &hslot2->head, slot2);
 		}
 		rcu_read_unlock();
 		return result;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 42923b1..b5bc667 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -200,7 +200,7 @@ static inline int compute_score2(struct sock *sk, struct net *net,
 static struct sock *udp6_lib_lookup2(struct net *net,
 		const struct in6_addr *saddr, __be16 sport,
 		const struct in6_addr *daddr, unsigned int hnum, int dif,
-		struct udp_hslot *hslot2, unsigned int slot2)
+		struct hlist_nulls_head *head, unsigned int slot2)
 {
 	struct sock *sk, *result;
 	struct hlist_nulls_node *node;
@@ -210,7 +210,7 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 begin:
 	result = NULL;
 	badness = -1;
-	udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
+	udp_portaddr_for_each_entry_rcu(sk, node, head) {
 		score = compute_score2(sk, net, saddr, sport,
 				      daddr, hnum, dif);
 		if (score > badness) {
@@ -274,7 +274,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
 
 		result = udp6_lib_lookup2(net, saddr, sport,
 					  daddr, hnum, dif,
-					  hslot2, slot2);
+					  &hslot2->head, slot2);
 		if (!result) {
 			hash2 = udp6_portaddr_hash(net, &in6addr_any, hnum);
 			slot2 = hash2 & udptable->mask;
@@ -284,7 +284,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
 
 			result = udp6_lib_lookup2(net, saddr, sport,
 						  &in6addr_any, hnum, dif,
-						  hslot2, slot2);
+						  &hslot2->head, slot2);
 		}
 		rcu_read_unlock();
 		return result;





^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22  2:01           ` Eric Dumazet
  2013-05-22  5:49             ` Eric Dumazet
@ 2013-05-22  9:58             ` Paul E. McKenney
  2013-05-22 12:28               ` Eric Dumazet
  1 sibling, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2013-05-22  9:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Roman Gushchin, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Tue, May 21, 2013 at 07:01:20PM -0700, Eric Dumazet wrote:
> On Tue, 2013-05-21 at 22:12 +0400, Roman Gushchin wrote:
> > > If other rcu accessors have the same problem, a more complete patch is
> > > needed.
> > 
> > [PATCH] rcu: fix a race in rcu lists traverse macros
> > 
> > Some network functions (udp4_lib_lookup2(), for instance) use the
> > rcu lists traverse macros (hlist_nulls_for_each_entry_rcu, for instance)
> > in a way that assumes restarting of a loop. In this case, it is strictly
> > necessary to reread the head->first value from the memory before each scan.
> > Without additional hints, gcc caches this value in a register. In this case,
> > if a cached node is moved to another chain during the scan, we can loop
> > forever getting wrong nulls values and restarting the loop uninterruptedly.
> > 
> > Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
> > Reported-by: Boris Zhmurov <zhmurov@yandex-team.ru>
> > ---
> >   include/linux/compiler.h      | 6 ++++++
> >   include/linux/rculist.h       | 6 ++++--
> >   include/linux/rculist_nulls.h | 3 ++-
> >   3 files changed, 12 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > index 92669cd..0e05d7c 100644
> > --- a/include/linux/compiler.h
> > +++ b/include/linux/compiler.h
> > @@ -351,6 +351,12 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
> >    */
> >   #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> > 
> > +/*
> > + * Prevent the compiler from optimizing accesses to structure member.
> > + */
> > +#define GET_STRUCT_FIELD_VOLATILE(struct_ptr, field)		\
> > +	(((volatile typeof(*struct_ptr) *)struct_ptr)->field)
> > +
> >   /* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
> >   #ifdef CONFIG_KPROBES
> >   # define __kprobes	__attribute__((__section__(".kprobes.text")))
> > diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> > index 8089e35..6c3eea7 100644
> > --- a/include/linux/rculist.h
> > +++ b/include/linux/rculist.h
> > @@ -282,7 +282,8 @@ static inline void list_splice_init_rcu(struct list_head *list,
> >    * as long as the traversal is guarded by rcu_read_lock().
> >    */
> >   #define list_for_each_entry_rcu(pos, head, member) \
> > -	for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> > +	for (pos = list_entry_rcu(GET_STRUCT_FIELD_VOLATILE(head, next), \
> > +				  typeof(*pos), member); \
> >   		&pos->member != (head); \
> >   		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
> > 
> > @@ -348,7 +349,8 @@ static inline void hlist_replace_rcu(struct hlist_node *old,
> >   /*
> >    * return the first or the next element in an RCU protected hlist
> >    */
> > -#define hlist_first_rcu(head)	(*((struct hlist_node __rcu **)(&(head)->first)))
> > +#define hlist_first_rcu(head)	(*((struct hlist_node __rcu **)		\
> > +				   (&GET_STRUCT_FIELD_VOLATILE(head, first))))
> >   #define hlist_next_rcu(node)	(*((struct hlist_node __rcu **)(&(node)->next)))
> >   #define hlist_pprev_rcu(node)	(*((struct hlist_node __rcu **)((node)->pprev)))
> > 
> > diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> > index 2ae1371..b04fd0a 100644
> > --- a/include/linux/rculist_nulls.h
> > +++ b/include/linux/rculist_nulls.h
> > @@ -38,7 +38,8 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
> >   }
> > 
> >   #define hlist_nulls_first_rcu(head) \
> > -	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> > +	(*((struct hlist_nulls_node __rcu __force **)	\
> > +	   &(GET_STRUCT_FIELD_VOLATILE(head, first))))
> > 
> >   #define hlist_nulls_next_rcu(node) \
> >   	(*((struct hlist_nulls_node __rcu __force **)&(node)->next))

Now that I am more awake...

The RCU list macros assume that the list header is either statically
allocated (in which case no ACCESS_ONCE() or whatever is needed) or
that the caller did whatever was necessary to protect the list header,
whether that be holding the right lock, using rcu_dereference() when
traversing the pointer to the list header, or whatever.

Maybe I need to document this assumption...

						Thanx, Paul

> Please use ACCESS_ONCE(), which is the standard way to deal with this,
> and remove the rcu_dereference_raw() in 
> hlist_nulls_for_each_entry_rcu()
> 
> something like : (for the nulls part only)
> 
> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> index 2ae1371..1d485d3 100644
> --- a/include/linux/rculist_nulls.h
> +++ b/include/linux/rculist_nulls.h
> @@ -40,6 +40,9 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
>  #define hlist_nulls_first_rcu(head) \
>  	(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> 
> +#define hlist_nulls_first_rcu_once(head) \
> +	ACCESS_ONCE(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> +
>  #define hlist_nulls_next_rcu(node) \
>  	(*((struct hlist_nulls_node __rcu __force **)&(node)->next))
> 
> @@ -107,7 +110,7 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
>   *
>   */
>  #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                        \
> -	for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
> +	for (pos = hlist_nulls_first_rcu_once(head);                            \
>  		(!is_a_nulls(pos)) &&                                           \
>  		({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
>                 pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
> 
> 
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22  5:49             ` Eric Dumazet
@ 2013-05-22 11:58               ` Roman Gushchin
  2013-05-22 12:30                 ` Eric Dumazet
  0 siblings, 1 reply; 57+ messages in thread
From: Roman Gushchin @ 2013-05-22 11:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On 22.05.2013 09:49, Eric Dumazet wrote:
> On Tue, 2013-05-21 at 19:01 -0700, Eric Dumazet wrote:
>
>> Please use ACCESS_ONCE(), which is the standard way to deal with this,
>> and remove the rcu_dereference_raw() in
>> hlist_nulls_for_each_entry_rcu()
>>
>> something like : (for the nulls part only)
>
> Thinking a bit more about this, I am a bit worried by other uses of
> ACCESS_ONCE(ptr->field)
>
> rcu_dereference(ptr->field) intent is to reload ptr->field every time
> from memory.

Exactly.

>
> Do we really need a
>
> #define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
>
> #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
>
> and change all rcu_dererence(ptr->field) occurrences ?

Yes.
But not all: only "head->first". The node variable (or any other iterator) is
always a local variable that changes every cycle. So, we can rely on gcc here.

>
> I probably miss something obvious.
>
> Anyway, following patch seems to also solve the problem, I need a sleep to understand why.
>
>   	struct hlist_nulls_node *node;
> @@ -420,7 +420,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
>   begin:
>   	result = NULL;
>   	badness = 0;
> -	udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
> +	udp_portaddr_for_each_entry_rcu(sk, node, head) {
>   		score = compute_score2(sk, net, saddr, sport,
>   				      daddr, hnum, dif);
>   		if (score > badness) {


IMHO, it doesn't solve.
Ok, ok, it can actually solve, as fast ANY modification of related code.
For instance, adding something like
	unsigned int c = 0;
	udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
		...
		c++;
		if (c > 1000000)
			printk(...);
	}
also "solves" the problem.

It looks like, that it's semantically strange to use the ACCESS_(FIELD)_ONCE
macro for telling gcc to reread something every time.
So, I created corresponding rcu_dereference_field(_raw/_bh) macros.

Please, find my third attempt to create a patch below.

Regards,
Roman

-----------------------------------------------------------------------

rcu: fix a race in rcu lists traverse macros

Some network functions (udp4_lib_lookup2(), for instance) use the
rcu lists traverse macros (hlist_nulls_for_each_entry_rcu, for instance)
in a way that assumes restarting of a loop. In this case, it is strictly
necessary to reread the head->first value from the memory before each scan.
Without additional hints, gcc caches this value in a register. In this case,
if a cached node is moved to another chain during the scan, we can loop
forever getting wrong nulls values and restarting the loop uninterruptedly.

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Reported-by: Boris Zhmurov <zhmurov@yandex-team.ru>
---
  include/linux/compiler.h      | 6 ++++++
  include/linux/rculist.h       | 9 +++++----
  include/linux/rculist_nulls.h | 2 +-
  include/linux/rcupdate.h      | 9 +++++++++
  4 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 92669cd..4109fab 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -351,6 +351,12 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
   */
  #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

+/*
+ * Same as ACCESS_ONCE(), but used for accessing field of a structure.
+ * The main goal is preventing compiler to store &ptr->field in a register.
+ */
+#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
+
  /* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
  #ifdef CONFIG_KPROBES
  # define __kprobes	__attribute__((__section__(".kprobes.text")))
diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 8089e35..7582cfe 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -282,7 +282,8 @@ static inline void list_splice_init_rcu(struct list_head *list,
   * as long as the traversal is guarded by rcu_read_lock().
   */
  #define list_for_each_entry_rcu(pos, head, member) \
-	for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
+	for (pos = list_entry_rcu(ACCESS_FIELD_ONCE(head, next), \
+				  typeof(*pos), member); \
  		&pos->member != (head); \
  		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))

@@ -439,7 +440,7 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
  }

  #define __hlist_for_each_rcu(pos, head)				\
-	for (pos = rcu_dereference(hlist_first_rcu(head));	\
+	for (pos = rcu_dereference_field(head, first);		\
  	     pos;						\
  	     pos = rcu_dereference(hlist_next_rcu(pos)))

@@ -454,7 +455,7 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
   * as long as the traversal is guarded by rcu_read_lock().
   */
  #define hlist_for_each_entry_rcu(pos, head, member)			\
-	for (pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
+	for (pos = hlist_entry_safe(rcu_dereference_field_raw(head, first), \
  			typeof(*(pos)), member);			\
  		pos;							\
  		pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\
@@ -471,7 +472,7 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
   * as long as the traversal is guarded by rcu_read_lock().
   */
  #define hlist_for_each_entry_rcu_bh(pos, head, member)			\
-	for (pos = hlist_entry_safe(rcu_dereference_bh(hlist_first_rcu(head)),\
+	for (pos = hlist_entry_safe(rcu_dereference_field_bh(head, first), \
  			typeof(*(pos)), member);			\
  		pos;							\
  		pos = hlist_entry_safe(rcu_dereference_bh(hlist_next_rcu(\
diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..ef431b4 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -107,7 +107,7 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
   *
   */
  #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)			\
-	for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));		\
+	for (pos = rcu_dereference_field_raw(head, first);			\
  		(!is_a_nulls(pos)) &&						\
  		({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
  		pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 4ccd68e..6fef0c2 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -640,6 +640,9 @@ static inline void rcu_preempt_sleep_check(void)

  #define rcu_dereference_raw(p) rcu_dereference_check(p, 1) /*@@@ needed? @@@*/

+#define rcu_dereference_field_raw(PTR, FIELD) \
+		rcu_dereference_raw(ACCESS_FIELD_ONCE(PTR, FIELD))
+
  /**
   * rcu_access_index() - fetch RCU index with no dereferencing
   * @p: The index to read
@@ -704,6 +707,9 @@ static inline void rcu_preempt_sleep_check(void)
   */
  #define rcu_dereference(p) rcu_dereference_check(p, 0)

+#define rcu_dereference_field(PTR, FIELD)		\
+	rcu_dereference(ACCESS_FIELD_ONCE(PTR, FIELD))
+
  /**
   * rcu_dereference_bh() - fetch an RCU-bh-protected pointer for dereferencing
   * @p: The pointer to read, prior to dereferencing
@@ -712,6 +718,9 @@ static inline void rcu_preempt_sleep_check(void)
   */
  #define rcu_dereference_bh(p) rcu_dereference_bh_check(p, 0)

+#define rcu_dereference_field_bh(PTR, FIELD)			\
+	rcu_dereference_bh(ACCESS_FIELD_ONCE(PTR, FIELD))
+
  /**
   * rcu_dereference_sched() - fetch RCU-sched-protected pointer for dereferencing
   * @p: The pointer to read, prior to dereferencing
-- 
1.8.1.2



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22  9:58             ` Paul E. McKenney
@ 2013-05-22 12:28               ` Eric Dumazet
  2013-05-22 13:00                 ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Eric Dumazet @ 2013-05-22 12:28 UTC (permalink / raw)
  To: paulmck
  Cc: Roman Gushchin, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Wed, 2013-05-22 at 02:58 -0700, Paul E. McKenney wrote:

> Now that I am more awake...
> 
> The RCU list macros assume that the list header is either statically
> allocated (in which case no ACCESS_ONCE() or whatever is needed) or
> that the caller did whatever was necessary to protect the list header,
> whether that be holding the right lock, using rcu_dereference() when
> traversing the pointer to the list header, or whatever.

Not sure what you mean, we do hold rcu_read_lock() here.

But when we jump back to begin, we do not do
"rcu_read_unlock()/rcu_read_lock()" pair.




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22 11:58               ` Roman Gushchin
@ 2013-05-22 12:30                 ` Eric Dumazet
  2013-05-22 13:07                   ` Roman Gushchin
  2013-05-22 13:27                     ` David Laight
  0 siblings, 2 replies; 57+ messages in thread
From: Eric Dumazet @ 2013-05-22 12:30 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote:

> +/*
> + * Same as ACCESS_ONCE(), but used for accessing field of a structure.
> + * The main goal is preventing compiler to store &ptr->field in a register.

But &ptr->field is a constant during the whole duration of
udp4_lib_lookup2() and could be in a register, in my case field is at
offset 0, and ptr is a parameter (so could be in a 'register')

The bug you found is that compiler caches the indirection  (ptr->field)
into a register, not that compiler stores &ptr->field into a register.

> + */
> +#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
> +

Here we force the compiler to consider ptr as volatile, but semantically
it is not required in rcu_dereference(ptr->field)  

We want field to be reloaded, not ptr.

So yes, the patch appears to fix the bug, but it sounds not logical to
me.




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22 12:28               ` Eric Dumazet
@ 2013-05-22 13:00                 ` Paul E. McKenney
  2013-05-22 14:16                   ` Eric Dumazet
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2013-05-22 13:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Roman Gushchin, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Wed, May 22, 2013 at 05:28:47AM -0700, Eric Dumazet wrote:
> On Wed, 2013-05-22 at 02:58 -0700, Paul E. McKenney wrote:
> 
> > Now that I am more awake...
> > 
> > The RCU list macros assume that the list header is either statically
> > allocated (in which case no ACCESS_ONCE() or whatever is needed) or
> > that the caller did whatever was necessary to protect the list header,
> > whether that be holding the right lock, using rcu_dereference() when
> > traversing the pointer to the list header, or whatever.
> 
> Not sure what you mean, we do hold rcu_read_lock() here.
> 
> But when we jump back to begin, we do not do
> "rcu_read_unlock()/rcu_read_lock()" pair.

Right, rcu_read_lock() is part of the protection, but rcu_dereference()
is the other part.

All that aside, I can't claim that I understand what problem the various
patches would solve.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22 12:30                 ` Eric Dumazet
@ 2013-05-22 13:07                   ` Roman Gushchin
  2013-05-22 17:45                     ` Paul E. McKenney
  2013-05-22 13:27                     ` David Laight
  1 sibling, 1 reply; 57+ messages in thread
From: Roman Gushchin @ 2013-05-22 13:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On 22.05.2013 16:30, Eric Dumazet wrote:
> On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote:
>
>> +/*
>> + * Same as ACCESS_ONCE(), but used for accessing field of a structure.
>> + * The main goal is preventing compiler to store &ptr->field in a register.
>
> But &ptr->field is a constant during the whole duration of
> udp4_lib_lookup2() and could be in a register, in my case field is at
> offset 0, and ptr is a parameter (so could be in a 'register')
>
> The bug you found is that compiler caches the indirection  (ptr->field)
> into a register, not that compiler stores &ptr->field into a register.
>
>> + */
>> +#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
>> +
>
> Here we force the compiler to consider ptr as volatile, but semantically
> it is not required in rcu_dereference(ptr->field)

Actually, we need to mark an "address of a place" where the field value is
located as volatile before dereferencing. I have no idea how to do it in another way,
except using multiple casts and offsetof's, but, IMHO, it will be even more complex:
	ACCESS_ONCE(typeof(&ptr->field)((char*)ptr + offsetof(typeof(*ptr), field)))

>
> We want field to be reloaded, not ptr.
>
> So yes, the patch appears to fix the bug, but it sounds not logical to
> me.
>

May be we can enhance it by providing better/more detailed comments here?
Have you any suggestions?

Thanks,
Roman

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22 12:30                 ` Eric Dumazet
@ 2013-05-22 13:27                     ` David Laight
  2013-05-22 13:27                     ` David Laight
  1 sibling, 0 replies; 57+ messages in thread
From: David Laight @ 2013-05-22 13:27 UTC (permalink / raw)
  To: Eric Dumazet, Roman Gushchin
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1216 bytes --]

> So yes, the patch appears to fix the bug, but it sounds not logical to
> me.

I was confused because the copy of the code I found was different
(it has some checks for reusaddr - which force a function call in the
loop).

The code being compiled is:

begin:
	result = NULL;
	badness = -1;
	udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
		score = compute_score2(sk, net, saddr, sport,
				      daddr, hnum, dif);
		if (score > badness) {
			result = sk;
			badness = score;
			if (score == SCORE2_MAX)
				goto exact_match;
		}
	}
	/*
	 * if the nulls value we got at the end of this lookup is
	 * not the expected one, we must restart lookup.
	 * We probably met an item that was moved to another chain.
	 */
	if (get_nulls_value(node) != slot2)
		goto begin;

Which is entirely inlined - so the compiler is allowed to assume
that no other code modifies any of the data.
Hence it is allowed to cache the list head value.
Indeed it could convert the last line to "for (;;);".

A asm volatile ("":::"memory") somewhere would fix it.

	David

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
@ 2013-05-22 13:27                     ` David Laight
  0 siblings, 0 replies; 57+ messages in thread
From: David Laight @ 2013-05-22 13:27 UTC (permalink / raw)
  To: Eric Dumazet, Roman Gushchin
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

> So yes, the patch appears to fix the bug, but it sounds not logical to
> me.

I was confused because the copy of the code I found was different
(it has some checks for reusaddr - which force a function call in the
loop).

The code being compiled is:

begin:
	result = NULL;
	badness = -1;
	udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
		score = compute_score2(sk, net, saddr, sport,
				      daddr, hnum, dif);
		if (score > badness) {
			result = sk;
			badness = score;
			if (score == SCORE2_MAX)
				goto exact_match;
		}
	}
	/*
	 * if the nulls value we got at the end of this lookup is
	 * not the expected one, we must restart lookup.
	 * We probably met an item that was moved to another chain.
	 */
	if (get_nulls_value(node) != slot2)
		goto begin;

Which is entirely inlined - so the compiler is allowed to assume
that no other code modifies any of the data.
Hence it is allowed to cache the list head value.
Indeed it could convert the last line to "for (;;);".

A asm volatile ("":::"memory") somewhere would fix it.

	David


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22 13:27                     ` David Laight
  (?)
@ 2013-05-22 13:36                     ` Eric Dumazet
  2013-05-22 14:23                         ` David Laight
  -1 siblings, 1 reply; 57+ messages in thread
From: Eric Dumazet @ 2013-05-22 13:36 UTC (permalink / raw)
  To: David Laight
  Cc: Roman Gushchin, paulmck, Dipankar Sarma, zhmurov, linux-kernel,
	netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Wed, 2013-05-22 at 14:27 +0100, David Laight wrote:
> > So yes, the patch appears to fix the bug, but it sounds not logical to
> > me.
> 
> I was confused because the copy of the code I found was different
> (it has some checks for reusaddr - which force a function call in the
> loop).
> 
> The code being compiled is:
> 
> begin:
> 	result = NULL;
> 	badness = -1;
> 	udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {

Here this loops begin by

 someptr = rcu_dereference(somelocation);

May claim is rcu_dereference() should force the compiler to read again
somelocation. Its done thanks to ACCESS_ONCE(). But apparently in the
specific case of &hslot->head, it doesnt work.

When using head directly (ie the same value), the ACCESS_ONCE() does
correctly its job.


> 		score = compute_score2(sk, net, saddr, sport,
> 				      daddr, hnum, dif);
> 		if (score > badness) {
> 			result = sk;
> 			badness = score;
> 			if (score == SCORE2_MAX)
> 				goto exact_match;
> 		}
> 	}
> 	/*
> 	 * if the nulls value we got at the end of this lookup is
> 	 * not the expected one, we must restart lookup.
> 	 * We probably met an item that was moved to another chain.
> 	 */
> 	if (get_nulls_value(node) != slot2)
> 		goto begin;
> 
> Which is entirely inlined - so the compiler is allowed to assume
> that no other code modifies any of the data.

You missed the rcu_dereference()/ACCESS_ONCE() we have in this loop.






^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22 13:27                     ` David Laight
  (?)
  (?)
@ 2013-05-22 13:55                     ` Roman Gushchin
  -1 siblings, 0 replies; 57+ messages in thread
From: Roman Gushchin @ 2013-05-22 13:55 UTC (permalink / raw)
  To: David Laight
  Cc: Eric Dumazet, paulmck, Dipankar Sarma, zhmurov, linux-kernel,
	netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On 22.05.2013 17:27, David Laight wrote:
>> So yes, the patch appears to fix the bug, but it sounds not logical to
>> me.
>
> I was confused because the copy of the code I found was different
> (it has some checks for reusaddr - which force a function call in the
> loop).
>
> The code being compiled is:
>
> begin:
> 	result = NULL;
> 	badness = -1;
> 	udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
> 		score = compute_score2(sk, net, saddr, sport,
> 				      daddr, hnum, dif);
> 		if (score > badness) {
> 			result = sk;
> 			badness = score;
> 			if (score == SCORE2_MAX)
> 				goto exact_match;
> 		}
> 	}
> 	/*
> 	 * if the nulls value we got at the end of this lookup is
> 	 * not the expected one, we must restart lookup.
> 	 * We probably met an item that was moved to another chain.
> 	 */
> 	if (get_nulls_value(node) != slot2)
> 		goto begin;
>
> Which is entirely inlined - so the compiler is allowed to assume
> that no other code modifies any of the data.
> Hence it is allowed to cache the list head value.
> Indeed it could convert the last line to "for (;;);".
>
> A asm volatile ("":::"memory") somewhere would fix it.
>

Yes, it does. It was my first attempt to fix the issue.
But putting such instructions into each such cycle isn't a good idea, IMHO.

Regards,
Roman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22 13:00                 ` Paul E. McKenney
@ 2013-05-22 14:16                   ` Eric Dumazet
  0 siblings, 0 replies; 57+ messages in thread
From: Eric Dumazet @ 2013-05-22 14:16 UTC (permalink / raw)
  To: paulmck
  Cc: Roman Gushchin, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Wed, 2013-05-22 at 06:00 -0700, Paul E. McKenney wrote:

> Right, rcu_read_lock() is part of the protection, but rcu_dereference()
> is the other part.
> 
> All that aside, I can't claim that I understand what problem the various
> patches would solve.  ;-)

Problem is that rcu_dereference(expr) might be optimized by the compiler
to cache the dereferenced data in certain circumstances.

Following patch shows the difference if you look at the generated code:

This patch fixes the problem, and its not really obvious why !

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 0bf5d39..6aa8088 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -416,11 +416,12 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 	struct hlist_nulls_node *node;
 	int score, badness, matches = 0, reuseport = 0;
 	u32 hash = 0;
+	struct hlist_nulls_head *head = &hslot2->head;
 
 begin:
 	result = NULL;
 	badness = 0;
-	udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
+	udp_portaddr_for_each_entry_rcu(sk, node, head) {
 		score = compute_score2(sk, net, saddr, sport,
 				      daddr, hnum, dif);
 		if (score > badness) {




^ permalink raw reply related	[flat|nested] 57+ messages in thread

* RE: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22 13:36                     ` Eric Dumazet
@ 2013-05-22 14:23                         ` David Laight
  0 siblings, 0 replies; 57+ messages in thread
From: David Laight @ 2013-05-22 14:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Roman Gushchin, paulmck, Dipankar Sarma, zhmurov, linux-kernel,
	netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1051 bytes --]

> Here this loops begin by
> 
>  someptr = rcu_dereference(somelocation);
> 
> May claim is rcu_dereference() should force the compiler to read again
> somelocation. Its done thanks to ACCESS_ONCE(). But apparently in the
> specific case of &hslot->head, it doesnt work.

Hmmm....
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
That might be doomed to fail for much the same reason as:
void x(struct foo *unaligned_ptr)
{
    char *p = (void *)unaligned_ptr;
    memcpy(tgt, p, sizeof *p);
}
generates alignment faults.
And that casts to a union type don't get around 'strict aliasing'.

Basically the compiler makes use of the fact that you should
cast addresses back to their original type before dereferencing them.

So I'm not sure you can use a cast to add a type qualifier.
The front-end lets you remove 'const', but I suspect the optimiser
is using the original types.

	David

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
@ 2013-05-22 14:23                         ` David Laight
  0 siblings, 0 replies; 57+ messages in thread
From: David Laight @ 2013-05-22 14:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Roman Gushchin, paulmck, Dipankar Sarma, zhmurov, linux-kernel,
	netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

> Here this loops begin by
> 
>  someptr = rcu_dereference(somelocation);
> 
> May claim is rcu_dereference() should force the compiler to read again
> somelocation. Its done thanks to ACCESS_ONCE(). But apparently in the
> specific case of &hslot->head, it doesnt work.

Hmmm....
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
That might be doomed to fail for much the same reason as:
void x(struct foo *unaligned_ptr)
{
    char *p = (void *)unaligned_ptr;
    memcpy(tgt, p, sizeof *p);
}
generates alignment faults.
And that casts to a union type don't get around 'strict aliasing'.

Basically the compiler makes use of the fact that you should
cast addresses back to their original type before dereferencing them.

So I'm not sure you can use a cast to add a type qualifier.
The front-end lets you remove 'const', but I suspect the optimiser
is using the original types.

	David


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22 13:07                   ` Roman Gushchin
@ 2013-05-22 17:45                     ` Paul E. McKenney
  2013-05-22 19:17                       ` Roman Gushchin
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2013-05-22 17:45 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Eric Dumazet, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Wed, May 22, 2013 at 05:07:07PM +0400, Roman Gushchin wrote:
> On 22.05.2013 16:30, Eric Dumazet wrote:
> >On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote:
> >
> >>+/*
> >>+ * Same as ACCESS_ONCE(), but used for accessing field of a structure.
> >>+ * The main goal is preventing compiler to store &ptr->field in a register.
> >
> >But &ptr->field is a constant during the whole duration of
> >udp4_lib_lookup2() and could be in a register, in my case field is at
> >offset 0, and ptr is a parameter (so could be in a 'register')
> >
> >The bug you found is that compiler caches the indirection  (ptr->field)
> >into a register, not that compiler stores &ptr->field into a register.
> >
> >>+ */
> >>+#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
> >>+
> >
> >Here we force the compiler to consider ptr as volatile, but semantically
> >it is not required in rcu_dereference(ptr->field)
> 
> Actually, we need to mark an "address of a place" where the field value is
> located as volatile before dereferencing. I have no idea how to do it in another way,
> except using multiple casts and offsetof's, but, IMHO, it will be even more complex:
> 	ACCESS_ONCE(typeof(&ptr->field)((char*)ptr + offsetof(typeof(*ptr), field)))

Why not just ACCESS_ONCE(ptr->field)?  Or if it is the thing that
ptr->field points to that is subject to change, ACCESS_ONCE(*ptr->field)?

Or rcu_dereference(ptr->field), as appropriate?

							Thanx, Paul

> >We want field to be reloaded, not ptr.
> >
> >So yes, the patch appears to fix the bug, but it sounds not logical to
> >me.
> >
> 
> May be we can enhance it by providing better/more detailed comments here?
> Have you any suggestions?
> 
> Thanks,
> Roman
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22 17:45                     ` Paul E. McKenney
@ 2013-05-22 19:17                       ` Roman Gushchin
  2013-05-25 11:37                         ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Roman Gushchin @ 2013-05-22 19:17 UTC (permalink / raw)
  To: paulmck
  Cc: Eric Dumazet, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On 22.05.2013 21:45, Paul E. McKenney wrote:
> On Wed, May 22, 2013 at 05:07:07PM +0400, Roman Gushchin wrote:
>> On 22.05.2013 16:30, Eric Dumazet wrote:
>>> On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote:
>>>
>>>> +/*
>>>> + * Same as ACCESS_ONCE(), but used for accessing field of a structure.
>>>> + * The main goal is preventing compiler to store &ptr->field in a register.
>>>
>>> But &ptr->field is a constant during the whole duration of
>>> udp4_lib_lookup2() and could be in a register, in my case field is at
>>> offset 0, and ptr is a parameter (so could be in a 'register')
>>>
>>> The bug you found is that compiler caches the indirection  (ptr->field)
>>> into a register, not that compiler stores &ptr->field into a register.
>>>
>>>> + */
>>>> +#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
>>>> +
>>>
>>> Here we force the compiler to consider ptr as volatile, but semantically
>>> it is not required in rcu_dereference(ptr->field)
>>
>> Actually, we need to mark an "address of a place" where the field value is
>> located as volatile before dereferencing. I have no idea how to do it in another way,
>> except using multiple casts and offsetof's, but, IMHO, it will be even more complex:
>> 	ACCESS_ONCE(typeof(&ptr->field)((char*)ptr + offsetof(typeof(*ptr), field)))

Probably I miss one more ACCESS_ONCE() in this expression. Should be something like
ACCESS_ONCE(typeof(&ptr->field)((char*)ACCESS_ONCE(ptr) + offsetof(typeof(*ptr), field))) .
But this is not a working example, just an illustration against using ACCESS_ONCE() here.

> Why not just ACCESS_ONCE(ptr->field)?  Or if it is the thing that
> ptr->field points to that is subject to change, ACCESS_ONCE(*ptr->field)?
>
> Or rcu_dereference(ptr->field), as appropriate?

It's not enough. So is the code now, and it doesn't work as expected.
You can't write (ptr->field) without ptr being marked as a volatile pointer.

I try to explain the problem once more from scratch:

1) We have the following pseudo-code (based on udp4_lib_lookup2()):

static void some_func(struct hlist_nulls_head *head) {
	struct hlist_nulls_node *node;

begin:
	for (node = rcu_dereference(head->first);
		!is_nulls(node) & ...;
		node = rcu_dereference(node->next)) {
		<...>
	}

	if (restart_condition)
		goto begin;

2) A problem occurs when restart_condition is true and we jump to the begin label.
We do not recalculate (head + offsetof(head, first)) address, we just dereference
again the OLD (head->first) pointer. So, we get a node, that WAS the first node in a
previous time instead of current first node. That node can be dead, or, for instance,
can be a head of another chain.

It is correct from gcc's point of view, since it doesn't expect the head pointer
to change, and this address is just (head + constant offset).

3) If we start with a wrong first element, restart_condition can be always true, so,
we get an infinite loop. In any case, we do not scan the whole (socket) chain,
as expected.

This scenario is absolutely real and causes our DNS servers to hang
sometimes under high load.

Note, that there are no problems if we don't restart a loop. Also, it is highly
dependent on gcc options, and the code in the body of the loop. Even small changes
in the code (like adding debugging print) preventing reproducing of the issue.

Thanks!

Regards,
Roman

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-22 19:17                       ` Roman Gushchin
@ 2013-05-25 11:37                         ` Paul E. McKenney
  2013-05-27 11:34                           ` Roman Gushchin
  2013-05-27 17:55                           ` Roman Gushchin
  0 siblings, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-05-25 11:37 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Eric Dumazet, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Wed, May 22, 2013 at 11:17:46PM +0400, Roman Gushchin wrote:
> On 22.05.2013 21:45, Paul E. McKenney wrote:
> >On Wed, May 22, 2013 at 05:07:07PM +0400, Roman Gushchin wrote:
> >>On 22.05.2013 16:30, Eric Dumazet wrote:
> >>>On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote:
> >>>
> >>>>+/*
> >>>>+ * Same as ACCESS_ONCE(), but used for accessing field of a structure.
> >>>>+ * The main goal is preventing compiler to store &ptr->field in a register.
> >>>
> >>>But &ptr->field is a constant during the whole duration of
> >>>udp4_lib_lookup2() and could be in a register, in my case field is at
> >>>offset 0, and ptr is a parameter (so could be in a 'register')
> >>>
> >>>The bug you found is that compiler caches the indirection  (ptr->field)
> >>>into a register, not that compiler stores &ptr->field into a register.
> >>>
> >>>>+ */
> >>>>+#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
> >>>>+
> >>>
> >>>Here we force the compiler to consider ptr as volatile, but semantically
> >>>it is not required in rcu_dereference(ptr->field)
> >>
> >>Actually, we need to mark an "address of a place" where the field value is
> >>located as volatile before dereferencing. I have no idea how to do it in another way,
> >>except using multiple casts and offsetof's, but, IMHO, it will be even more complex:
> >>	ACCESS_ONCE(typeof(&ptr->field)((char*)ptr + offsetof(typeof(*ptr), field)))
> 
> Probably I miss one more ACCESS_ONCE() in this expression. Should be something like
> ACCESS_ONCE(typeof(&ptr->field)((char*)ACCESS_ONCE(ptr) + offsetof(typeof(*ptr), field))) .
> But this is not a working example, just an illustration against using ACCESS_ONCE() here.
> 
> >Why not just ACCESS_ONCE(ptr->field)?  Or if it is the thing that
> >ptr->field points to that is subject to change, ACCESS_ONCE(*ptr->field)?
> >
> >Or rcu_dereference(ptr->field), as appropriate?
> 
> It's not enough. So is the code now, and it doesn't work as expected.
> You can't write (ptr->field) without ptr being marked as a volatile pointer.
> 
> I try to explain the problem once more from scratch:
> 
> 1) We have the following pseudo-code (based on udp4_lib_lookup2()):
> 
> static void some_func(struct hlist_nulls_head *head) {
> 	struct hlist_nulls_node *node;
> 
> begin:
> 	for (node = rcu_dereference(head->first);
> 		!is_nulls(node) & ...;
> 		node = rcu_dereference(node->next)) {
> 		<...>
> 	}
> 
> 	if (restart_condition)
> 		goto begin;
> 
> 2) A problem occurs when restart_condition is true and we jump to the begin label.
> We do not recalculate (head + offsetof(head, first)) address, we just dereference
> again the OLD (head->first) pointer. So, we get a node, that WAS the first node in a
> previous time instead of current first node. That node can be dead, or, for instance,
> can be a head of another chain.

OK, this is what I was referring to when I said that the RCU list macros
assume that the list header is static (or equivalently, appropriately
protected).

With some_func() as written above, you would need to return some sort
of failure indication from some_func(), and have the caller refetch head.
Otherwise, as far as gcc is concerned, the value of the parameter head
is constant throughout some_func().

> It is correct from gcc's point of view, since it doesn't expect the head pointer
> to change, and this address is just (head + constant offset).

Agreed.

How does the caller calculate the value to pass in through the argument "head"?

> 3) If we start with a wrong first element, restart_condition can be always true, so,
> we get an infinite loop. In any case, we do not scan the whole (socket) chain,
> as expected.

Agreed.

> This scenario is absolutely real and causes our DNS servers to hang
> sometimes under high load.

I completely believe that such a hang could happen.

> Note, that there are no problems if we don't restart a loop. Also, it is highly
> dependent on gcc options, and the code in the body of the loop. Even small changes
> in the code (like adding debugging print) preventing reproducing of the issue.

Again, I believe that your retry logic needs to extend back into the
calling function for your some_func() example above.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-25 11:37                         ` Paul E. McKenney
@ 2013-05-27 11:34                           ` Roman Gushchin
  2013-05-27 17:55                           ` Roman Gushchin
  1 sibling, 0 replies; 57+ messages in thread
From: Roman Gushchin @ 2013-05-27 11:34 UTC (permalink / raw)
  To: paulmck, Eric Dumazet
  Cc: Dipankar Sarma, zhmurov, linux-kernel, netdev, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy

On 25.05.2013 15:37, Paul E. McKenney wrote:
>> 2) A problem occurs when restart_condition is true and we jump to the begin label.
>> We do not recalculate (head + offsetof(head, first)) address, we just dereference
>> again the OLD (head->first) pointer. So, we get a node, that WAS the first node in a
>> previous time instead of current first node. That node can be dead, or, for instance,
>> can be a head of another chain.
>
> OK, this is what I was referring to when I said that the RCU list macros
> assume that the list header is static (or equivalently, appropriately
> protected).
>
> With some_func() as written above, you would need to return some sort
> of failure indication from some_func(), and have the caller refetch head.
> Otherwise, as far as gcc is concerned, the value of the parameter head
> is constant throughout some_func().

Personally I have nothing against this approach, but I'm not sure, if
networking maintainers will adopt corresponding patch. I'll try to find it out.

>
>> It is correct from gcc's point of view, since it doesn't expect the head pointer
>> to change, and this address is just (head + constant offset).
>
> Agreed.
>
> How does the caller calculate the value to pass in through the argument "head"?

struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
		__be16 sport, __be32 daddr, __be16 dport,
		int dif, struct udp_table *udptable)
{
	...
	unsigned int hash2, slot2, slot = udp_hashfn(net, hnum, udptable->mask);
	struct udp_hslot *hslot2, *hslot = &udptable->hash[slot];
	int score, badness;

	rcu_read_lock();
	if (hslot->count > 10) {
		hash2 = udp4_portaddr_hash(net, daddr, hnum);
		slot2 = hash2 & udptable->mask;
		hslot2 = &udptable->hash2[slot2];
		...
		result = udp4_lib_lookup2(net, saddr, sport,
					  daddr, hnum, dif,
					  hslot2, slot2);
	...
}

static struct sock *udp4_lib_lookup2(struct net *net,
		__be32 saddr, __be16 sport,
		__be32 daddr, unsigned int hnum, int dif,
		struct udp_hslot *hslot2, unsigned int slot2)
{
	...
	udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
		...

Thank you!

Regards,
Roman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-25 11:37                         ` Paul E. McKenney
  2013-05-27 11:34                           ` Roman Gushchin
@ 2013-05-27 17:55                           ` Roman Gushchin
  2013-05-28  0:12                             ` Eric Dumazet
  1 sibling, 1 reply; 57+ messages in thread
From: Roman Gushchin @ 2013-05-27 17:55 UTC (permalink / raw)
  To: paulmck
  Cc: Eric Dumazet, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

Hi, Paul!

> On 25.05.2013 15:37, Paul E. McKenney wrote:
>> Again, I believe that your retry logic needs to extend back into the
>> calling function for your some_func() example above.

And what do you think about the following approach (diff below)?

It seems to me, it's enough clear (especially with good accompanying comments)
and produces a good binary code (without significant overhead).
Also, we will remove a hidden reef in using rcu-protected (h)list traverses with restarts.

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 6f95e24..8d8b1eb 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -269,7 +269,8 @@ static inline void list_splice_init_rcu(struct list_head *list,
   * as long as the traversal is guarded by rcu_read_lock().
   */
  #define list_for_each_entry_rcu(pos, head, member) \
-       for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
+       for (ACCESS_ONCE(*(head)), \
+               pos = list_entry_rcu((head)->next, typeof(*pos), member); \
                 &pos->member != (head); \
                 pos = list_entry_rcu(pos->member.next, typeof(*pos), member))

@@ -459,7 +460,8 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
   * as long as the traversal is guarded by rcu_read_lock().
   */
  #define hlist_for_each_entry_rcu(tpos, pos, head, member)              \
-       for (pos = rcu_dereference_raw(hlist_first_rcu(head));          \
+       for (ACCESS_ONCE(*(head)),                                      \
+               pos = rcu_dereference_raw(hlist_first_rcu(head));       \
                 pos &&                                                   \
                 ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \
                 pos = rcu_dereference_raw(hlist_next_rcu(pos)))
@@ -476,7 +478,7 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
   * as long as the traversal is guarded by rcu_read_lock().
   */
  #define hlist_for_each_entry_rcu_bh(tpos, pos, head, member)            \
-       for (pos = rcu_dereference_bh((head)->first);                    \
+       for (ACCESS_ONCE(*(head)), pos = rcu_dereference_bh((head)->first); \
                 pos &&                                                   \
                 ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \
                 pos = rcu_dereference_bh(pos->next))
diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..4af5ee5 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -107,7 +107,8 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
   *
   */
  #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                        \
-       for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
+       for (ACCESS_ONCE(*(head)),                                              \
+               pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));         \
                 (!is_a_nulls(pos)) &&                                           \
                 ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
                 pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))

---

Thanks!

Regards,
Roman


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-27 17:55                           ` Roman Gushchin
@ 2013-05-28  0:12                             ` Eric Dumazet
  2013-05-28  9:10                               ` Roman Gushchin
  0 siblings, 1 reply; 57+ messages in thread
From: Eric Dumazet @ 2013-05-28  0:12 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Mon, 2013-05-27 at 21:55 +0400, Roman Gushchin wrote:
> Hi, Paul!
> 
> > On 25.05.2013 15:37, Paul E. McKenney wrote:
> >> Again, I believe that your retry logic needs to extend back into the
> >> calling function for your some_func() example above.
> 
> And what do you think about the following approach (diff below)?
> 
> It seems to me, it's enough clear (especially with good accompanying comments)
> and produces a good binary code (without significant overhead).
> Also, we will remove a hidden reef in using rcu-protected (h)list traverses with restarts.
> 

> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> index 2ae1371..4af5ee5 100644
> --- a/include/linux/rculist_nulls.h
> +++ b/include/linux/rculist_nulls.h
> @@ -107,7 +107,8 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
>    *
>    */
>   #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                        \
> -       for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
> +       for (ACCESS_ONCE(*(head)),                                              \
> +               pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));         \
>                  (!is_a_nulls(pos)) &&                                           \
>                  ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
>                  pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))

It looks like this still relies on gcc being friendly here.

I repeat again : @head here is a constant.

Macro already uses ACCESS_ONCE(), we only have to instruct gcc that
caching the value is forbidden if we restart the loop 
(aka "goto begin;" see Documentation/RCU/rculist_nulls.txt line 146)

Adding a barrier() is probably what we want.

I cooked followed patch and it fixes the problem.

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..4dc51b2 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -105,8 +105,12 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
  * @head:	the head for your list.
  * @member:	the name of the hlist_nulls_node within the struct.
  *
+ * The barrier() is needed to make sure compiler doesn't cache first element,
+ * as this loop can be restarted.
+ * (cf Documentation/RCU/rculist_nulls.txt around line 146)
  */
 #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)			\
+	barrier();								\
 	for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));		\
 		(!is_a_nulls(pos)) &&						\
 		({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-28  0:12                             ` Eric Dumazet
@ 2013-05-28  9:10                               ` Roman Gushchin
  2013-05-29  0:34                                 ` Eric Dumazet
  2013-05-29  1:19                                 ` Paul E. McKenney
  0 siblings, 2 replies; 57+ messages in thread
From: Roman Gushchin @ 2013-05-28  9:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On 28.05.2013 04:12, Eric Dumazet wrote:
> On Mon, 2013-05-27 at 21:55 +0400, Roman Gushchin wrote:
>> Hi, Paul!
>>
>>> On 25.05.2013 15:37, Paul E. McKenney wrote:
>>>> Again, I believe that your retry logic needs to extend back into the
>>>> calling function for your some_func() example above.
>>
>> And what do you think about the following approach (diff below)?
>>
>> It seems to me, it's enough clear (especially with good accompanying comments)
>> and produces a good binary code (without significant overhead).
>> Also, we will remove a hidden reef in using rcu-protected (h)list traverses with restarts.
>>
>
>> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
>> index 2ae1371..4af5ee5 100644
>> --- a/include/linux/rculist_nulls.h
>> +++ b/include/linux/rculist_nulls.h
>> @@ -107,7 +107,8 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
>>     *
>>     */
>>    #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                        \
>> -       for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
>> +       for (ACCESS_ONCE(*(head)),                                              \
>> +               pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));         \
>>                   (!is_a_nulls(pos)) &&                                           \
>>                   ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
>>                   pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
>
> It looks like this still relies on gcc being friendly here.
>
> I repeat again : @head here is a constant.

No.
Actually, there are two volatile objects: pointer to the first element (as a part of the head structure),
and the first element by itself. So, to be strict, head as a structure contains a volatile field.
Head->first should be treated as a volatile pointer to a volatile data. So, the whole head object is volatile.

>
> Macro already uses ACCESS_ONCE(), we only have to instruct gcc that
> caching the value is forbidden if we restart the loop
> (aka "goto begin;" see Documentation/RCU/rculist_nulls.txt line 146)

My patch seems to be correct, because, ACCESS_ONCE(*(head)) will cause gcc to (re)read head data from the memory.
According to gcc documentation:
	"A scalar volatile object is read when it is accessed in a void context:
      		volatile int *src = somevalue;
      		*src;
	Such expressions are rvalues, and GCC implements this as a read of the volatile object being pointed to."
And this is exactly our case.

> Adding a barrier() is probably what we want.

I agree, inserting barrier() is also a correct and working fix.

Thanks!

Regards,
Roman

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-28  9:10                               ` Roman Gushchin
@ 2013-05-29  0:34                                 ` Eric Dumazet
  2013-05-29  1:31                                   ` Paul E. McKenney
  2013-05-29  9:17                                   ` Roman Gushchin
  2013-05-29  1:19                                 ` Paul E. McKenney
  1 sibling, 2 replies; 57+ messages in thread
From: Eric Dumazet @ 2013-05-29  0:34 UTC (permalink / raw)
  To: Roman Gushchin, Jesper Dangaard Brouer
  Cc: paulmck, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
> On 28.05.2013 04:12, Eric Dumazet wrote:

> > Adding a barrier() is probably what we want.
> 
> I agree, inserting barrier() is also a correct and working fix.

Yeah, but I can not find a clean way to put it inside the "for (;;)"

for (barrier();;)  ->

error: expected expression before ‘__asm__’

No user currently does :

if (condition)
	hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)

But who knows...



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-28  9:10                               ` Roman Gushchin
  2013-05-29  0:34                                 ` Eric Dumazet
@ 2013-05-29  1:19                                 ` Paul E. McKenney
  1 sibling, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-05-29  1:19 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Eric Dumazet, Dipankar Sarma, zhmurov, linux-kernel, netdev,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Tue, May 28, 2013 at 01:10:46PM +0400, Roman Gushchin wrote:
> On 28.05.2013 04:12, Eric Dumazet wrote:
> >On Mon, 2013-05-27 at 21:55 +0400, Roman Gushchin wrote:
> >>Hi, Paul!
> >>
> >>>On 25.05.2013 15:37, Paul E. McKenney wrote:
> >>>>Again, I believe that your retry logic needs to extend back into the
> >>>>calling function for your some_func() example above.
> >>
> >>And what do you think about the following approach (diff below)?
> >>
> >>It seems to me, it's enough clear (especially with good accompanying comments)
> >>and produces a good binary code (without significant overhead).
> >>Also, we will remove a hidden reef in using rcu-protected (h)list traverses with restarts.
> >>
> >
> >>diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> >>index 2ae1371..4af5ee5 100644
> >>--- a/include/linux/rculist_nulls.h
> >>+++ b/include/linux/rculist_nulls.h
> >>@@ -107,7 +107,8 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
> >>    *
> >>    */
> >>   #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                        \
> >>-       for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
> >>+       for (ACCESS_ONCE(*(head)),                                              \
> >>+               pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));         \
> >>                  (!is_a_nulls(pos)) &&                                           \
> >>                  ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
> >>                  pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
> >
> >It looks like this still relies on gcc being friendly here.
> >
> >I repeat again : @head here is a constant.
> 
> No.
> Actually, there are two volatile objects: pointer to the first element (as a part of the head structure),
> and the first element by itself. So, to be strict, head as a structure contains a volatile field.
> Head->first should be treated as a volatile pointer to a volatile data. So, the whole head object is volatile.
> 
> >
> >Macro already uses ACCESS_ONCE(), we only have to instruct gcc that
> >caching the value is forbidden if we restart the loop
> >(aka "goto begin;" see Documentation/RCU/rculist_nulls.txt line 146)
> 
> My patch seems to be correct, because, ACCESS_ONCE(*(head)) will cause gcc to (re)read head data from the memory.
> According to gcc documentation:
> 	"A scalar volatile object is read when it is accessed in a void context:
>      		volatile int *src = somevalue;
>      		*src;
> 	Such expressions are rvalues, and GCC implements this as a read of the volatile object being pointed to."
> And this is exactly our case.
> 
> >Adding a barrier() is probably what we want.
> 
> I agree, inserting barrier() is also a correct and working fix.

I have to ask this question of both of you...

Are either of Eric's or Roman's fixes really guaranteed to work if gcc
elects not to inline the function containing the RCU list traversal?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-29  0:34                                 ` Eric Dumazet
@ 2013-05-29  1:31                                   ` Paul E. McKenney
  2013-05-29  5:08                                     ` Eric Dumazet
  2013-05-29  9:17                                   ` Roman Gushchin
  1 sibling, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2013-05-29  1:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Roman Gushchin, Jesper Dangaard Brouer, Dipankar Sarma, zhmurov,
	linux-kernel, netdev, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy

On Tue, May 28, 2013 at 05:34:53PM -0700, Eric Dumazet wrote:
> On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
> > On 28.05.2013 04:12, Eric Dumazet wrote:
> 
> > > Adding a barrier() is probably what we want.
> > 
> > I agree, inserting barrier() is also a correct and working fix.
> 
> Yeah, but I can not find a clean way to put it inside the "for (;;)"
> 
> for (barrier();;)  ->
> 
> error: expected expression before ‘__asm__’
> 
> No user currently does :
> 
> if (condition)
> 	hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)
> 
> But who knows...

I still have my earlier question, but I suggest "({ barrier(); XXX })"
to put the barrier into the for loop, either in the second or third
clause, where XXX was the original second or third clause.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-29  1:31                                   ` Paul E. McKenney
@ 2013-05-29  5:08                                     ` Eric Dumazet
  2013-05-29 10:09                                       ` Roman Gushchin
  0 siblings, 1 reply; 57+ messages in thread
From: Eric Dumazet @ 2013-05-29  5:08 UTC (permalink / raw)
  To: paulmck
  Cc: Roman Gushchin, Jesper Dangaard Brouer, Dipankar Sarma, zhmurov,
	linux-kernel, netdev, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy

On Tue, 2013-05-28 at 18:31 -0700, Paul E. McKenney wrote:
> On Tue, May 28, 2013 at 05:34:53PM -0700, Eric Dumazet wrote:
> > On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
> > > On 28.05.2013 04:12, Eric Dumazet wrote:
> > 
> > > > Adding a barrier() is probably what we want.
> > > 
> > > I agree, inserting barrier() is also a correct and working fix.
> > 
> > Yeah, but I can not find a clean way to put it inside the "for (;;)"
> > 
> > for (barrier();;)  ->
> > 
> > error: expected expression before ‘__asm__’
> > 
> > No user currently does :
> > 
> > if (condition)
> > 	hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)
> > 
> > But who knows...
> 
> I still have my earlier question, but I suggest "({ barrier(); XXX })"
> to put the barrier into the for loop, either in the second or third
> clause, where XXX was the original second or third clause.
> 
> 	

Hmm, it doesn't work, I wanted to replace :

barrier();
for (expr1 ; expr2; expr3) {

by :

for ( some_clever_thing ; expr2; expr3) {

So barrier() should not be in second or third clause : it must be done
once and before "expr1".

Not a big deal anyway, we're not adding new
hlist_nulls_for_each_entry_rcu() users :)


About your earlier question, I really don't know why compiler would
cache a memory read if we explicitly use barrier() to prevent this from
happening.

BTW Roman patch generates a double load as in :

2cb1:       49 8b 07                mov    (%r15),%rax    
2cb4:       49 8b 07                mov    (%r15),%rax        


...
2ea2:       e8 f9 dc ff ff          callq  ba0 <sock_put>
2ea7:       8b 0c 24                mov    (%rsp),%ecx
2eaa:       e9 02 fe ff ff          jmpq   2cb1 <udp4_lib_lookup2+0x91> 

because of ACCESS_ONCE() used twice, once explicitly in
hlist_nulls_for_each_entry_rcu(), and once in rcu_dereference_raw()

While barrier();ptr = rcu_dereference(X); does generate a single load.




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-29  0:34                                 ` Eric Dumazet
  2013-05-29  1:31                                   ` Paul E. McKenney
@ 2013-05-29  9:17                                   ` Roman Gushchin
  1 sibling, 0 replies; 57+ messages in thread
From: Roman Gushchin @ 2013-05-29  9:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jesper Dangaard Brouer, paulmck, Dipankar Sarma, zhmurov,
	linux-kernel, netdev, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy

On 29.05.2013 04:34, Eric Dumazet wrote:
> On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
>> On 28.05.2013 04:12, Eric Dumazet wrote:
>
>>> Adding a barrier() is probably what we want.
>>
>> I agree, inserting barrier() is also a correct and working fix.
>
> Yeah, but I can not find a clean way to put it inside the "for (;;)"

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..74d5065 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -107,7 +107,7 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
   *
   */
  #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                        \
-       for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
+       for (({barrier();}), pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
                 (!is_a_nulls(pos)) &&                                           \
                 ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
                 pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))




^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-29  5:08                                     ` Eric Dumazet
@ 2013-05-29 10:09                                       ` Roman Gushchin
  2013-05-29 19:06                                         ` Eric Dumazet
  0 siblings, 1 reply; 57+ messages in thread
From: Roman Gushchin @ 2013-05-29 10:09 UTC (permalink / raw)
  To: Eric Dumazet, paulmck
  Cc: Jesper Dangaard Brouer, Dipankar Sarma, zhmurov, linux-kernel,
	netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On 29.05.2013 09:08, Eric Dumazet wrote:
> On Tue, 2013-05-28 at 18:31 -0700, Paul E. McKenney wrote:
>> On Tue, May 28, 2013 at 05:34:53PM -0700, Eric Dumazet wrote:
>>> On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
>>>> On 28.05.2013 04:12, Eric Dumazet wrote:

> About your earlier question, I really don't know why compiler would
> cache a memory read if we explicitly use barrier() to prevent this from
> happening.

I agree.

> BTW Roman patch generates a double load as in :
>
> 2cb1:       49 8b 07                mov    (%r15),%rax
> 2cb4:       49 8b 07                mov    (%r15),%rax
>
>
> ...
> 2ea2:       e8 f9 dc ff ff          callq  ba0 <sock_put>
> 2ea7:       8b 0c 24                mov    (%rsp),%ecx
> 2eaa:       e9 02 fe ff ff          jmpq   2cb1 <udp4_lib_lookup2+0x91>
>
> because of ACCESS_ONCE() used twice, once explicitly in
> hlist_nulls_for_each_entry_rcu(), and once in rcu_dereference_raw()
>
> While barrier();ptr = rcu_dereference(X); does generate a single load.

It's true.

Unfortunately, using barrier() can also prevent gcc from making some
(acceptable) code optimizations, because barrier() has a global effect,
and everything we want to reload is the (head->first) pointer.
So, to be absolutely precise, we have to introduce and use
the ACCESS_FIELD_ONCE() macro.

In any case, it doesn't look like a big problem.
In my mind, the best solution is to use the ACCESS_FIELD_ONCE() macro,
but using barrier() is also an acceptable solution.

Regards,
Roman

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-29 10:09                                       ` Roman Gushchin
@ 2013-05-29 19:06                                         ` Eric Dumazet
  2013-05-30  8:25                                           ` Roman Gushchin
  0 siblings, 1 reply; 57+ messages in thread
From: Eric Dumazet @ 2013-05-29 19:06 UTC (permalink / raw)
  To: Roman Gushchin, David Miller
  Cc: paulmck, Jesper Dangaard Brouer, Dipankar Sarma, zhmurov,
	linux-kernel, netdev, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy

On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:

> Unfortunately, using barrier() can also prevent gcc from making some
> (acceptable) code optimizations, because barrier() has a global effect,
> and everything we want to reload is the (head->first) pointer.
> So, to be absolutely precise, we have to introduce and use
> the ACCESS_FIELD_ONCE() macro.

Yep, ACCESS_ONCE() seems to not work for a field, at least the
ACCESS_ONCE() used in __rcu_dereference_check() 

> 
> In any case, it doesn't look like a big problem.
> In my mind, the best solution is to use the ACCESS_FIELD_ONCE() macro,
> but using barrier() is also an acceptable solution.
> 

True, these lookup functions are usually structured the same around the
hlist_nulls_for_each_entry_rcu() loop.

A barrier() right before the loop seems to be a benefit, the size of
assembly code is reduced by 48 bytes.

And its one of the documented way to handle this kind of problems
(Documentation/atomic_ops.txt line 114)

I guess we should amend this documentation, eventually.

Thanks, please add you "Signed-off-by" if you agree with the patch.


[PATCH] net: force a reload of first item in hlist_nulls_for_each_entry_rcu

Roman Gushchin discovered that udp4_lib_lookup2() was not reloading
first item in the rcu protected list, in case the loop was restarted.

This produced soft lockups as in https://lkml.org/lkml/2013/4/16/37

rcu_dereference(X)/ACCESS_ONCE(X) seem to not work as intended if X is
ptr->field :

In some cases, gcc caches the value or ptr->field in a register.

Use a barrier() to disallow such caching, as documented in
Documentation/atomic_ops.txt line 114

Thanks a lot to Roman for providing analysis and numerous patches.

Diagnosed-by: Roman Gushchin <klamm@yandex-team.ru>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Boris Zhmurov <zhmurov@yandex-team.ru>
---
 include/linux/rculist_nulls.h |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..c7557fa 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -105,9 +105,14 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
  * @head:	the head for your list.
  * @member:	the name of the hlist_nulls_node within the struct.
  *
+ * The barrier() is needed to make sure compiler doesn't cache first element [1],
+ * as this loop can be restarted [2]
+ * [1] Documentation/atomic_ops.txt around line 114
+ * [2] Documentation/RCU/rculist_nulls.txt around line 146 
  */
 #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)			\
-	for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));		\
+	for (({barrier();}),							\
+	     pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));		\
 		(!is_a_nulls(pos)) &&						\
 		({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
 		pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-29 19:06                                         ` Eric Dumazet
@ 2013-05-30  8:25                                           ` Roman Gushchin
  2013-06-02 23:31                                             ` Eric Dumazet
  0 siblings, 1 reply; 57+ messages in thread
From: Roman Gushchin @ 2013-05-30  8:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, paulmck, Jesper Dangaard Brouer, Dipankar Sarma,
	zhmurov, linux-kernel, netdev, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, David Laight

On 29.05.2013 23:06, Eric Dumazet wrote:
> On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
>
> True, these lookup functions are usually structured the same around the
> hlist_nulls_for_each_entry_rcu() loop.
>
> A barrier() right before the loop seems to be a benefit, the size of
> assembly code is reduced by 48 bytes.
>
> And its one of the documented way to handle this kind of problems
> (Documentation/atomic_ops.txt line 114)
>
> I guess we should amend this documentation, eventually.
>
> Thanks, please add you "Signed-off-by" if you agree with the patch.
>

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>

Many thanks to you, Paul E. McKenney and David Laight for your
patches, help and participation in this discussion.

>
> [PATCH] net: force a reload of first item in hlist_nulls_for_each_entry_rcu
>
> Roman Gushchin discovered that udp4_lib_lookup2() was not reloading
> first item in the rcu protected list, in case the loop was restarted.
>
> This produced soft lockups as in https://lkml.org/lkml/2013/4/16/37
>
> rcu_dereference(X)/ACCESS_ONCE(X) seem to not work as intended if X is
> ptr->field :
>
> In some cases, gcc caches the value or ptr->field in a register.
>
> Use a barrier() to disallow such caching, as documented in
> Documentation/atomic_ops.txt line 114
>
> Thanks a lot to Roman for providing analysis and numerous patches.
>
> Diagnosed-by: Roman Gushchin <klamm@yandex-team.ru>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Boris Zhmurov <zhmurov@yandex-team.ru>
> ---
>   include/linux/rculist_nulls.h |    7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> index 2ae1371..c7557fa 100644
> --- a/include/linux/rculist_nulls.h
> +++ b/include/linux/rculist_nulls.h
> @@ -105,9 +105,14 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
>    * @head:	the head for your list.
>    * @member:	the name of the hlist_nulls_node within the struct.
>    *
> + * The barrier() is needed to make sure compiler doesn't cache first element [1],
> + * as this loop can be restarted [2]
> + * [1] Documentation/atomic_ops.txt around line 114
> + * [2] Documentation/RCU/rculist_nulls.txt around line 146
>    */
>   #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)			\
> -	for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));		\
> +	for (({barrier();}),							\
> +	     pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));		\
>   		(!is_a_nulls(pos)) &&						\
>   		({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
>   		pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
>
>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-05-30  8:25                                           ` Roman Gushchin
@ 2013-06-02 23:31                                             ` Eric Dumazet
  2013-06-03  2:58                                               ` David Miller
  0 siblings, 1 reply; 57+ messages in thread
From: Eric Dumazet @ 2013-06-02 23:31 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: David Miller, paulmck, Jesper Dangaard Brouer, Dipankar Sarma,
	zhmurov, linux-kernel, netdev, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, David Laight

On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
> On 29.05.2013 23:06, Eric Dumazet wrote:
> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
> >
> > True, these lookup functions are usually structured the same around the
> > hlist_nulls_for_each_entry_rcu() loop.
> >
> > A barrier() right before the loop seems to be a benefit, the size of
> > assembly code is reduced by 48 bytes.
> >
> > And its one of the documented way to handle this kind of problems
> > (Documentation/atomic_ops.txt line 114)
> >
> > I guess we should amend this documentation, eventually.
> >
> > Thanks, please add you "Signed-off-by" if you agree with the patch.
> >
> 
> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
> 
> Many thanks to you, Paul E. McKenney and David Laight for your
> patches, help and participation in this discussion.

Thanks to you !

David, is there any problem with the patch ?

http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
tell me what is needed to get it merged.

Thanks !




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-06-02 23:31                                             ` Eric Dumazet
@ 2013-06-03  2:58                                               ` David Miller
  2013-06-03  3:12                                                 ` Eric Dumazet
  2013-06-03  3:42                                                 ` Paul E. McKenney
  0 siblings, 2 replies; 57+ messages in thread
From: David Miller @ 2013-06-03  2:58 UTC (permalink / raw)
  To: eric.dumazet
  Cc: klamm, paulmck, brouer, dipankar, zhmurov, linux-kernel, netdev,
	kuznet, jmorris, yoshfuji, kaber, David.Laight

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 02 Jun 2013 16:31:35 -0700

> On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
>> On 29.05.2013 23:06, Eric Dumazet wrote:
>> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
>> >
>> > True, these lookup functions are usually structured the same around the
>> > hlist_nulls_for_each_entry_rcu() loop.
>> >
>> > A barrier() right before the loop seems to be a benefit, the size of
>> > assembly code is reduced by 48 bytes.
>> >
>> > And its one of the documented way to handle this kind of problems
>> > (Documentation/atomic_ops.txt line 114)
>> >
>> > I guess we should amend this documentation, eventually.
>> >
>> > Thanks, please add you "Signed-off-by" if you agree with the patch.
>> >
>> 
>> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
>> 
>> Many thanks to you, Paul E. McKenney and David Laight for your
>> patches, help and participation in this discussion.
> 
> Thanks to you !
> 
> David, is there any problem with the patch ?
> 
> http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
> tell me what is needed to get it merged.

It's not a networking patch, it's a patch for generic RCU upstream.
So it either goes through Paul McKenney, or directly via Linus on
linux-kernel.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-06-03  2:58                                               ` David Miller
@ 2013-06-03  3:12                                                 ` Eric Dumazet
  2013-06-03  3:27                                                   ` David Miller
  2013-06-03  3:48                                                   ` Paul E. McKenney
  2013-06-03  3:42                                                 ` Paul E. McKenney
  1 sibling, 2 replies; 57+ messages in thread
From: Eric Dumazet @ 2013-06-03  3:12 UTC (permalink / raw)
  To: David Miller
  Cc: klamm, paulmck, brouer, dipankar, zhmurov, linux-kernel, netdev,
	kuznet, jmorris, yoshfuji, kaber, David.Laight

On Sun, 2013-06-02 at 19:58 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 02 Jun 2013 16:31:35 -0700
> 
> > On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
> >> On 29.05.2013 23:06, Eric Dumazet wrote:
> >> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
> >> >
> >> > True, these lookup functions are usually structured the same around the
> >> > hlist_nulls_for_each_entry_rcu() loop.
> >> >
> >> > A barrier() right before the loop seems to be a benefit, the size of
> >> > assembly code is reduced by 48 bytes.
> >> >
> >> > And its one of the documented way to handle this kind of problems
> >> > (Documentation/atomic_ops.txt line 114)
> >> >
> >> > I guess we should amend this documentation, eventually.
> >> >
> >> > Thanks, please add you "Signed-off-by" if you agree with the patch.
> >> >
> >> 
> >> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
> >> 
> >> Many thanks to you, Paul E. McKenney and David Laight for your
> >> patches, help and participation in this discussion.
> > 
> > Thanks to you !
> > 
> > David, is there any problem with the patch ?
> > 
> > http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
> > tell me what is needed to get it merged.
> 
> It's not a networking patch, it's a patch for generic RCU upstream.
> So it either goes through Paul McKenney, or directly via Linus on
> linux-kernel.

Well, whole discussion went on linux kernel, and you were the original
committer of this patch five years ago.

http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=bbaffaca4810de1a25e32ecaf836eeaacc7a3d11

So please Paul or David make sure the patch goes in.

My only concern is that Paul seems quite busy these days.

Thanks



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-06-03  3:12                                                 ` Eric Dumazet
@ 2013-06-03  3:27                                                   ` David Miller
  2013-06-03  3:42                                                     ` Paul E. McKenney
  2013-06-03  3:48                                                   ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: David Miller @ 2013-06-03  3:27 UTC (permalink / raw)
  To: eric.dumazet
  Cc: klamm, paulmck, brouer, dipankar, zhmurov, linux-kernel, netdev,
	kuznet, jmorris, yoshfuji, kaber, David.Laight

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 02 Jun 2013 20:12:49 -0700

> On Sun, 2013-06-02 at 19:58 -0700, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Sun, 02 Jun 2013 16:31:35 -0700
>> 
>> > On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
>> >> On 29.05.2013 23:06, Eric Dumazet wrote:
>> >> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
>> >> >
>> >> > True, these lookup functions are usually structured the same around the
>> >> > hlist_nulls_for_each_entry_rcu() loop.
>> >> >
>> >> > A barrier() right before the loop seems to be a benefit, the size of
>> >> > assembly code is reduced by 48 bytes.
>> >> >
>> >> > And its one of the documented way to handle this kind of problems
>> >> > (Documentation/atomic_ops.txt line 114)
>> >> >
>> >> > I guess we should amend this documentation, eventually.
>> >> >
>> >> > Thanks, please add you "Signed-off-by" if you agree with the patch.
>> >> >
>> >> 
>> >> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
>> >> 
>> >> Many thanks to you, Paul E. McKenney and David Laight for your
>> >> patches, help and participation in this discussion.
>> > 
>> > Thanks to you !
>> > 
>> > David, is there any problem with the patch ?
>> > 
>> > http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
>> > tell me what is needed to get it merged.
>> 
>> It's not a networking patch, it's a patch for generic RCU upstream.
>> So it either goes through Paul McKenney, or directly via Linus on
>> linux-kernel.
> 
> Well, whole discussion went on linux kernel, and you were the original
> committer of this patch five years ago.
> 
> http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=bbaffaca4810de1a25e32ecaf836eeaacc7a3d11
> 
> So please Paul or David make sure the patch goes in.
> 
> My only concern is that Paul seems quite busy these days.

Ok I can take it then, no problem.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-06-03  2:58                                               ` David Miller
  2013-06-03  3:12                                                 ` Eric Dumazet
@ 2013-06-03  3:42                                                 ` Paul E. McKenney
  1 sibling, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-06-03  3:42 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, klamm, brouer, dipankar, zhmurov, linux-kernel,
	netdev, kuznet, jmorris, yoshfuji, kaber, David.Laight

On Sun, Jun 02, 2013 at 07:58:26PM -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 02 Jun 2013 16:31:35 -0700
> 
> > On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
> >> On 29.05.2013 23:06, Eric Dumazet wrote:
> >> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
> >> >
> >> > True, these lookup functions are usually structured the same around the
> >> > hlist_nulls_for_each_entry_rcu() loop.
> >> >
> >> > A barrier() right before the loop seems to be a benefit, the size of
> >> > assembly code is reduced by 48 bytes.
> >> >
> >> > And its one of the documented way to handle this kind of problems
> >> > (Documentation/atomic_ops.txt line 114)
> >> >
> >> > I guess we should amend this documentation, eventually.
> >> >
> >> > Thanks, please add you "Signed-off-by" if you agree with the patch.
> >> >
> >> 
> >> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
> >> 
> >> Many thanks to you, Paul E. McKenney and David Laight for your
> >> patches, help and participation in this discussion.
> > 
> > Thanks to you !
> > 
> > David, is there any problem with the patch ?
> > 
> > http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
> > tell me what is needed to get it merged.
> 
> It's not a networking patch, it's a patch for generic RCU upstream.
> So it either goes through Paul McKenney, or directly via Linus on
> linux-kernel.

I would be happy to take it.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-06-03  3:27                                                   ` David Miller
@ 2013-06-03  3:42                                                     ` Paul E. McKenney
  2013-06-03  3:47                                                       ` Eric Dumazet
  2013-06-03  3:49                                                       ` David Miller
  0 siblings, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-06-03  3:42 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, klamm, brouer, dipankar, zhmurov, linux-kernel,
	netdev, kuznet, jmorris, yoshfuji, kaber, David.Laight

On Sun, Jun 02, 2013 at 08:27:03PM -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 02 Jun 2013 20:12:49 -0700
> 
> > On Sun, 2013-06-02 at 19:58 -0700, David Miller wrote:
> >> From: Eric Dumazet <eric.dumazet@gmail.com>
> >> Date: Sun, 02 Jun 2013 16:31:35 -0700
> >> 
> >> > On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
> >> >> On 29.05.2013 23:06, Eric Dumazet wrote:
> >> >> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
> >> >> >
> >> >> > True, these lookup functions are usually structured the same around the
> >> >> > hlist_nulls_for_each_entry_rcu() loop.
> >> >> >
> >> >> > A barrier() right before the loop seems to be a benefit, the size of
> >> >> > assembly code is reduced by 48 bytes.
> >> >> >
> >> >> > And its one of the documented way to handle this kind of problems
> >> >> > (Documentation/atomic_ops.txt line 114)
> >> >> >
> >> >> > I guess we should amend this documentation, eventually.
> >> >> >
> >> >> > Thanks, please add you "Signed-off-by" if you agree with the patch.
> >> >> >
> >> >> 
> >> >> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
> >> >> 
> >> >> Many thanks to you, Paul E. McKenney and David Laight for your
> >> >> patches, help and participation in this discussion.
> >> > 
> >> > Thanks to you !
> >> > 
> >> > David, is there any problem with the patch ?
> >> > 
> >> > http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
> >> > tell me what is needed to get it merged.
> >> 
> >> It's not a networking patch, it's a patch for generic RCU upstream.
> >> So it either goes through Paul McKenney, or directly via Linus on
> >> linux-kernel.
> > 
> > Well, whole discussion went on linux kernel, and you were the original
> > committer of this patch five years ago.
> > 
> > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=bbaffaca4810de1a25e32ecaf836eeaacc7a3d11
> > 
> > So please Paul or David make sure the patch goes in.
> > 
> > My only concern is that Paul seems quite busy these days.
> 
> Ok I can take it then, no problem.

Or if you would like to take it:

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Just let me know.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-06-03  3:42                                                     ` Paul E. McKenney
@ 2013-06-03  3:47                                                       ` Eric Dumazet
  2013-06-03  3:49                                                       ` David Miller
  1 sibling, 0 replies; 57+ messages in thread
From: Eric Dumazet @ 2013-06-03  3:47 UTC (permalink / raw)
  To: paulmck
  Cc: David Miller, klamm, brouer, dipankar, zhmurov, linux-kernel,
	netdev, kuznet, jmorris, yoshfuji, kaber, David.Laight

On Sun, 2013-06-02 at 20:42 -0700, Paul E. McKenney wrote:

> Or if you would like to take it:
> 
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 

Thanks guys !



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-06-03  3:12                                                 ` Eric Dumazet
  2013-06-03  3:27                                                   ` David Miller
@ 2013-06-03  3:48                                                   ` Paul E. McKenney
  1 sibling, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-06-03  3:48 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, klamm, brouer, dipankar, zhmurov, linux-kernel,
	netdev, kuznet, jmorris, yoshfuji, kaber, David.Laight

On Sun, Jun 02, 2013 at 08:12:49PM -0700, Eric Dumazet wrote:
> On Sun, 2013-06-02 at 19:58 -0700, David Miller wrote:
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Sun, 02 Jun 2013 16:31:35 -0700
> > 
> > > On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
> > >> On 29.05.2013 23:06, Eric Dumazet wrote:
> > >> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
> > >> >
> > >> > True, these lookup functions are usually structured the same around the
> > >> > hlist_nulls_for_each_entry_rcu() loop.
> > >> >
> > >> > A barrier() right before the loop seems to be a benefit, the size of
> > >> > assembly code is reduced by 48 bytes.
> > >> >
> > >> > And its one of the documented way to handle this kind of problems
> > >> > (Documentation/atomic_ops.txt line 114)
> > >> >
> > >> > I guess we should amend this documentation, eventually.
> > >> >
> > >> > Thanks, please add you "Signed-off-by" if you agree with the patch.
> > >> >
> > >> 
> > >> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
> > >> 
> > >> Many thanks to you, Paul E. McKenney and David Laight for your
> > >> patches, help and participation in this discussion.
> > > 
> > > Thanks to you !
> > > 
> > > David, is there any problem with the patch ?
> > > 
> > > http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
> > > tell me what is needed to get it merged.
> > 
> > It's not a networking patch, it's a patch for generic RCU upstream.
> > So it either goes through Paul McKenney, or directly via Linus on
> > linux-kernel.
> 
> Well, whole discussion went on linux kernel, and you were the original
> committer of this patch five years ago.
> 
> http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=bbaffaca4810de1a25e32ecaf836eeaacc7a3d11
> 
> So please Paul or David make sure the patch goes in.
> 
> My only concern is that Paul seems quite busy these days.

One reason for the longer-than-usual response delays is my current timezone,
IST.  ;-)  Back to PDT on the 10th.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-06-03  3:42                                                     ` Paul E. McKenney
  2013-06-03  3:47                                                       ` Eric Dumazet
@ 2013-06-03  3:49                                                       ` David Miller
  2013-06-03  6:05                                                         ` Paul E. McKenney
  2013-06-10 18:29                                                         ` Boris B. Zhmurov
  1 sibling, 2 replies; 57+ messages in thread
From: David Miller @ 2013-06-03  3:49 UTC (permalink / raw)
  To: paulmck
  Cc: eric.dumazet, klamm, brouer, dipankar, zhmurov, linux-kernel,
	netdev, kuznet, jmorris, yoshfuji, kaber, David.Laight

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Sun, 2 Jun 2013 20:42:54 -0700

> On Sun, Jun 02, 2013 at 08:27:03PM -0700, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Sun, 02 Jun 2013 20:12:49 -0700
>> 
>> > On Sun, 2013-06-02 at 19:58 -0700, David Miller wrote:
>> >> From: Eric Dumazet <eric.dumazet@gmail.com>
>> >> Date: Sun, 02 Jun 2013 16:31:35 -0700
>> >> 
>> >> > On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
>> >> >> On 29.05.2013 23:06, Eric Dumazet wrote:
>> >> >> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
>> >> >> >
>> >> >> > True, these lookup functions are usually structured the same around the
>> >> >> > hlist_nulls_for_each_entry_rcu() loop.
>> >> >> >
>> >> >> > A barrier() right before the loop seems to be a benefit, the size of
>> >> >> > assembly code is reduced by 48 bytes.
>> >> >> >
>> >> >> > And its one of the documented way to handle this kind of problems
>> >> >> > (Documentation/atomic_ops.txt line 114)
>> >> >> >
>> >> >> > I guess we should amend this documentation, eventually.
>> >> >> >
>> >> >> > Thanks, please add you "Signed-off-by" if you agree with the patch.
>> >> >> >
>> >> >> 
>> >> >> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
>> >> >> 
>> >> >> Many thanks to you, Paul E. McKenney and David Laight for your
>> >> >> patches, help and participation in this discussion.
>> >> > 
>> >> > Thanks to you !
>> >> > 
>> >> > David, is there any problem with the patch ?
>> >> > 
>> >> > http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
>> >> > tell me what is needed to get it merged.
>> >> 
>> >> It's not a networking patch, it's a patch for generic RCU upstream.
>> >> So it either goes through Paul McKenney, or directly via Linus on
>> >> linux-kernel.
>> > 
>> > Well, whole discussion went on linux kernel, and you were the original
>> > committer of this patch five years ago.
>> > 
>> > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=bbaffaca4810de1a25e32ecaf836eeaacc7a3d11
>> > 
>> > So please Paul or David make sure the patch goes in.
>> > 
>> > My only concern is that Paul seems quite busy these days.
>> 
>> Ok I can take it then, no problem.
> 
> Or if you would like to take it:
> 
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Let's end this deadlock by me taking it :-)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-06-03  3:49                                                       ` David Miller
@ 2013-06-03  6:05                                                         ` Paul E. McKenney
  2013-06-10 18:29                                                         ` Boris B. Zhmurov
  1 sibling, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2013-06-03  6:05 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, klamm, brouer, dipankar, zhmurov, linux-kernel,
	netdev, kuznet, jmorris, yoshfuji, kaber, David.Laight

On Sun, Jun 02, 2013 at 08:49:15PM -0700, David Miller wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Date: Sun, 2 Jun 2013 20:42:54 -0700
> 
> > On Sun, Jun 02, 2013 at 08:27:03PM -0700, David Miller wrote:
> >> From: Eric Dumazet <eric.dumazet@gmail.com>
> >> Date: Sun, 02 Jun 2013 20:12:49 -0700
> >> 
> >> > On Sun, 2013-06-02 at 19:58 -0700, David Miller wrote:
> >> >> From: Eric Dumazet <eric.dumazet@gmail.com>
> >> >> Date: Sun, 02 Jun 2013 16:31:35 -0700
> >> >> 
> >> >> > On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
> >> >> >> On 29.05.2013 23:06, Eric Dumazet wrote:
> >> >> >> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
> >> >> >> >
> >> >> >> > True, these lookup functions are usually structured the same around the
> >> >> >> > hlist_nulls_for_each_entry_rcu() loop.
> >> >> >> >
> >> >> >> > A barrier() right before the loop seems to be a benefit, the size of
> >> >> >> > assembly code is reduced by 48 bytes.
> >> >> >> >
> >> >> >> > And its one of the documented way to handle this kind of problems
> >> >> >> > (Documentation/atomic_ops.txt line 114)
> >> >> >> >
> >> >> >> > I guess we should amend this documentation, eventually.
> >> >> >> >
> >> >> >> > Thanks, please add you "Signed-off-by" if you agree with the patch.
> >> >> >> >
> >> >> >> 
> >> >> >> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
> >> >> >> 
> >> >> >> Many thanks to you, Paul E. McKenney and David Laight for your
> >> >> >> patches, help and participation in this discussion.
> >> >> > 
> >> >> > Thanks to you !
> >> >> > 
> >> >> > David, is there any problem with the patch ?
> >> >> > 
> >> >> > http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
> >> >> > tell me what is needed to get it merged.
> >> >> 
> >> >> It's not a networking patch, it's a patch for generic RCU upstream.
> >> >> So it either goes through Paul McKenney, or directly via Linus on
> >> >> linux-kernel.
> >> > 
> >> > Well, whole discussion went on linux kernel, and you were the original
> >> > committer of this patch five years ago.
> >> > 
> >> > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=bbaffaca4810de1a25e32ecaf836eeaacc7a3d11
> >> > 
> >> > So please Paul or David make sure the patch goes in.
> >> > 
> >> > My only concern is that Paul seems quite busy these days.
> >> 
> >> Ok I can take it then, no problem.
> > 
> > Or if you would like to take it:
> > 
> > Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Let's end this deadlock by me taking it :-)

You got it!  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-06-03  3:49                                                       ` David Miller
  2013-06-03  6:05                                                         ` Paul E. McKenney
@ 2013-06-10 18:29                                                         ` Boris B. Zhmurov
  2013-06-10 18:51                                                           ` Eric Dumazet
  1 sibling, 1 reply; 57+ messages in thread
From: Boris B. Zhmurov @ 2013-06-10 18:29 UTC (permalink / raw)
  To: David Miller
  Cc: paulmck, eric.dumazet, klamm, brouer, dipankar, zhmurov,
	linux-kernel, netdev, kuznet, jmorris, yoshfuji, kaber,
	David.Laight, stable, gregkh

On 06/03/2013 07:49 AM, David Miller wrote:


>>>> Well, whole discussion went on linux kernel, and you were the original
>>>> committer of this patch five years ago.
>>>>
>>>> http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=bbaffaca4810de1a25e32ecaf836eeaacc7a3d11
>>>>
>>>> So please Paul or David make sure the patch goes in.
>>>>
>>>> My only concern is that Paul seems quite busy these days.
>>>
>>> Ok I can take it then, no problem.
>>
>> Or if you would like to take it:
>>
>> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Let's end this deadlock by me taking it :-)


Hello David,

Do you think, this patch is "stable material" and should be included in
3.0-stable and 3.4-stable trees? Thanks.

-- 
Boris B. Zhmurov
Yandex.Search SRE Team
*************
т.: +7 (495) 739-70-00 ext.7160
http://yandex.ru
*************

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
  2013-06-10 18:29                                                         ` Boris B. Zhmurov
@ 2013-06-10 18:51                                                           ` Eric Dumazet
  0 siblings, 0 replies; 57+ messages in thread
From: Eric Dumazet @ 2013-06-10 18:51 UTC (permalink / raw)
  To: zhmurov
  Cc: David Miller, paulmck, klamm, brouer, dipankar, linux-kernel,
	netdev, kuznet, jmorris, yoshfuji, kaber, David.Laight, stable,
	gregkh

On Mon, 2013-06-10 at 22:29 +0400, Boris B. Zhmurov wrote:

> Hello David,
> 
> Do you think, this patch is "stable material" and should be included in
> 3.0-stable and 3.4-stable trees? Thanks.

Yes, its in the stable queue :

http://patchwork.ozlabs.org/bundle/davem/stable/?state=*




^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2013-06-10 18:52 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-21  9:05 [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro Roman Gushchin
2013-05-21 10:40 ` David Laight
2013-05-21 11:55   ` Roman Gushchin
2013-05-21 13:42     ` David Laight
2013-05-21 12:09 ` Paul E. McKenney
2013-05-21 12:46   ` Roman Gushchin
2013-05-21 12:58     ` Paul E. McKenney
2013-05-21 13:37   ` Eric Dumazet
2013-05-21 13:44   ` Eric Dumazet
2013-05-21 14:47     ` Roman Gushchin
2013-05-21 15:16       ` Eric Dumazet
2013-05-21 15:51         ` Roman Gushchin
2013-05-21 15:38       ` Eric Dumazet
2013-05-21 15:51         ` Roman Gushchin
2013-05-21 18:12         ` [PATCH v2] " Roman Gushchin
2013-05-22  2:01           ` Eric Dumazet
2013-05-22  5:49             ` Eric Dumazet
2013-05-22 11:58               ` Roman Gushchin
2013-05-22 12:30                 ` Eric Dumazet
2013-05-22 13:07                   ` Roman Gushchin
2013-05-22 17:45                     ` Paul E. McKenney
2013-05-22 19:17                       ` Roman Gushchin
2013-05-25 11:37                         ` Paul E. McKenney
2013-05-27 11:34                           ` Roman Gushchin
2013-05-27 17:55                           ` Roman Gushchin
2013-05-28  0:12                             ` Eric Dumazet
2013-05-28  9:10                               ` Roman Gushchin
2013-05-29  0:34                                 ` Eric Dumazet
2013-05-29  1:31                                   ` Paul E. McKenney
2013-05-29  5:08                                     ` Eric Dumazet
2013-05-29 10:09                                       ` Roman Gushchin
2013-05-29 19:06                                         ` Eric Dumazet
2013-05-30  8:25                                           ` Roman Gushchin
2013-06-02 23:31                                             ` Eric Dumazet
2013-06-03  2:58                                               ` David Miller
2013-06-03  3:12                                                 ` Eric Dumazet
2013-06-03  3:27                                                   ` David Miller
2013-06-03  3:42                                                     ` Paul E. McKenney
2013-06-03  3:47                                                       ` Eric Dumazet
2013-06-03  3:49                                                       ` David Miller
2013-06-03  6:05                                                         ` Paul E. McKenney
2013-06-10 18:29                                                         ` Boris B. Zhmurov
2013-06-10 18:51                                                           ` Eric Dumazet
2013-06-03  3:48                                                   ` Paul E. McKenney
2013-06-03  3:42                                                 ` Paul E. McKenney
2013-05-29  9:17                                   ` Roman Gushchin
2013-05-29  1:19                                 ` Paul E. McKenney
2013-05-22 13:27                   ` David Laight
2013-05-22 13:27                     ` David Laight
2013-05-22 13:36                     ` Eric Dumazet
2013-05-22 14:23                       ` David Laight
2013-05-22 14:23                         ` David Laight
2013-05-22 13:55                     ` Roman Gushchin
2013-05-22  9:58             ` Paul E. McKenney
2013-05-22 12:28               ` Eric Dumazet
2013-05-22 13:00                 ` Paul E. McKenney
2013-05-22 14:16                   ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.