All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kernel: fix data race in put_pid
@ 2015-09-17 13:24 Dmitry Vyukov
  2015-09-17 16:08 ` Oleg Nesterov
  0 siblings, 1 reply; 40+ messages in thread
From: Dmitry Vyukov @ 2015-09-17 13:24 UTC (permalink / raw)
  To: ebiederm, viro, akpm, oleg, mingo, paulmck, mhocko
  Cc: linux-kernel, ktsan, kcc, andreyknvl, glider, hboehm, Dmitry Vyukov

put_pid checks whether the current thread has the only reference
to the pid with atomic_read() which does not have any memory
barriers, and if so proceeds directly to kmem_cache_free().
As the result memory accesses to the object in kmem_cache_free()
or user accesses to the object after reallocation (again without
any memory barriers on fast path) can hoist above the atomic_read()
check and conflict with memory accesses to the pid object in other
threads before they released their references.

There is a control dependency between the atomic_read() check and
kmem_cache_free(), but control dependencies are disregarded by some
architectures. Documentation/memory-barriers.txt explicitly states:
"A load-load control dependency requires a full read memory barrier.
... please note that READ_ONCE_CTRL() is not optional! [even for stores]"
in the CONTROL DEPENDENCIES section.

For example, if store to the first word of the object to build a freelist
in kmem_cache_free() hoists above the check, stores to the first word
in other threads can corrupt the memory allocator freelist.

Use atomic_read_acquire() for the fast path check to hand off properly
acquired object to memory allocator.

The data race was found with KernelThreadSanitizer (KTSAN).

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
---
 kernel/pid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index ca36879..3b0b13d 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -242,7 +242,7 @@ void put_pid(struct pid *pid)
 		return;
 
 	ns = pid->numbers[pid->level].ns;
-	if ((atomic_read(&pid->count) == 1) ||
+	if ((atomic_read_acquire(&pid->count) == 1) ||
 	     atomic_dec_and_test(&pid->count)) {
 		kmem_cache_free(ns->pid_cachep, pid);
 		put_pid_ns(ns);
-- 
2.6.0.rc0.131.gf624c3d


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-17 13:24 [PATCH] kernel: fix data race in put_pid Dmitry Vyukov
@ 2015-09-17 16:08 ` Oleg Nesterov
  2015-09-17 16:41   ` Dmitry Vyukov
  0 siblings, 1 reply; 40+ messages in thread
From: Oleg Nesterov @ 2015-09-17 16:08 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: ebiederm, viro, akpm, mingo, paulmck, mhocko, linux-kernel,
	ktsan, kcc, andreyknvl, glider, hboehm, Peter Zijlstra

Honestly, I can not see how this can happen. So I do not really
understand the problem and the fix.

And if this can happen I can't understand how this patch can help.
What about "ns = pid->numbers[pid->level].ns" ? It can be reordered
with atomic_read_acquire().

I leave this to other reviewers, but perhaps you can spell the
"For example" part of the changelog.


On 09/17, Dmitry Vyukov wrote:
>
> put_pid checks whether the current thread has the only reference
> to the pid with atomic_read() which does not have any memory
> barriers, and if so proceeds directly to kmem_cache_free().
> As the result memory accesses to the object in kmem_cache_free()
> or user accesses to the object after reallocation (again without
> any memory barriers on fast path) can hoist above the atomic_read()
> check and conflict with memory accesses to the pid object in other
> threads before they released their references.
> 
> There is a control dependency between the atomic_read() check and
> kmem_cache_free(), but control dependencies are disregarded by some
> architectures. Documentation/memory-barriers.txt explicitly states:
> "A load-load control dependency requires a full read memory barrier.
> ... please note that READ_ONCE_CTRL() is not optional! [even for stores]"
> in the CONTROL DEPENDENCIES section.
> 
> For example, if store to the first word of the object to build a freelist
> in kmem_cache_free() hoists above the check, stores to the first word
> in other threads can corrupt the memory allocator freelist.
> 
> Use atomic_read_acquire() for the fast path check to hand off properly
> acquired object to memory allocator.
> 
> The data race was found with KernelThreadSanitizer (KTSAN).
> 
> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> ---
>  kernel/pid.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/pid.c b/kernel/pid.c
> index ca36879..3b0b13d 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -242,7 +242,7 @@ void put_pid(struct pid *pid)
>  		return;
>  
>  	ns = pid->numbers[pid->level].ns;
> -	if ((atomic_read(&pid->count) == 1) ||
> +	if ((atomic_read_acquire(&pid->count) == 1) ||
>  	     atomic_dec_and_test(&pid->count)) {
>  		kmem_cache_free(ns->pid_cachep, pid);
>  		put_pid_ns(ns);
> -- 
> 2.6.0.rc0.131.gf624c3d
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-17 16:08 ` Oleg Nesterov
@ 2015-09-17 16:41   ` Dmitry Vyukov
  2015-09-17 17:44     ` Oleg Nesterov
  2015-09-17 17:56     ` Paul E. McKenney
  0 siblings, 2 replies; 40+ messages in thread
From: Dmitry Vyukov @ 2015-09-17 16:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ebiederm, Al Viro, Andrew Morton, Ingo Molnar, Paul McKenney,
	mhocko, LKML, ktsan, Kostya Serebryany, Andrey Konovalov,
	Alexander Potapenko, Hans Boehm, Peter Zijlstra

What happens here exactly matches what is described in CONTROL
DEPENDENCIES section of Documentation/memory-barriers.txt. So all the
bad things described there are possible here. The document explicitly
requires usage of rmb/acquire/READ_ONCE_CTRL in such cases. I don't
know what to add to that.

Regarding reordering of "ns = pid->numbers[pid->level].ns". If we are
talking about the thread that releases the last reference via the
fast-path atomic_read check, then, yes, they can be reordered by both
compiler and hardware, but only in a non-observable way, and so it
does not matter. Only in a non-observable way because the freeing
thread is the same as the thread that does "ns =
pid->numbers[pid->level].ns" and both compilers and hardware visibly
preserve program order for single-thread.
If we are talking about threads that release all but last reference,
then, no, they can't be reordered, because those threads execute
atomic_dec_and_test which is a release operation.




On Thu, Sep 17, 2015 at 6:08 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> Honestly, I can not see how this can happen. So I do not really
> understand the problem and the fix.
>
> And if this can happen I can't understand how this patch can help.
> What about "ns = pid->numbers[pid->level].ns" ? It can be reordered
> with atomic_read_acquire().
>
> I leave this to other reviewers, but perhaps you can spell the
> "For example" part of the changelog.
>
>
> On 09/17, Dmitry Vyukov wrote:
>>
>> put_pid checks whether the current thread has the only reference
>> to the pid with atomic_read() which does not have any memory
>> barriers, and if so proceeds directly to kmem_cache_free().
>> As the result memory accesses to the object in kmem_cache_free()
>> or user accesses to the object after reallocation (again without
>> any memory barriers on fast path) can hoist above the atomic_read()
>> check and conflict with memory accesses to the pid object in other
>> threads before they released their references.
>>
>> There is a control dependency between the atomic_read() check and
>> kmem_cache_free(), but control dependencies are disregarded by some
>> architectures. Documentation/memory-barriers.txt explicitly states:
>> "A load-load control dependency requires a full read memory barrier.
>> ... please note that READ_ONCE_CTRL() is not optional! [even for stores]"
>> in the CONTROL DEPENDENCIES section.
>>
>> For example, if store to the first word of the object to build a freelist
>> in kmem_cache_free() hoists above the check, stores to the first word
>> in other threads can corrupt the memory allocator freelist.
>>
>> Use atomic_read_acquire() for the fast path check to hand off properly
>> acquired object to memory allocator.
>>
>> The data race was found with KernelThreadSanitizer (KTSAN).
>>
>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>> ---
>>  kernel/pid.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/pid.c b/kernel/pid.c
>> index ca36879..3b0b13d 100644
>> --- a/kernel/pid.c
>> +++ b/kernel/pid.c
>> @@ -242,7 +242,7 @@ void put_pid(struct pid *pid)
>>               return;
>>
>>       ns = pid->numbers[pid->level].ns;
>> -     if ((atomic_read(&pid->count) == 1) ||
>> +     if ((atomic_read_acquire(&pid->count) == 1) ||
>>            atomic_dec_and_test(&pid->count)) {
>>               kmem_cache_free(ns->pid_cachep, pid);
>>               put_pid_ns(ns);
>> --
>> 2.6.0.rc0.131.gf624c3d
>>
>



-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-17 16:41   ` Dmitry Vyukov
@ 2015-09-17 17:44     ` Oleg Nesterov
  2015-09-17 17:57       ` Dmitry Vyukov
  2015-09-17 17:56     ` Paul E. McKenney
  1 sibling, 1 reply; 40+ messages in thread
From: Oleg Nesterov @ 2015-09-17 17:44 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: ebiederm, Al Viro, Andrew Morton, Ingo Molnar, Paul McKenney,
	mhocko, LKML, ktsan, Kostya Serebryany, Andrey Konovalov,
	Alexander Potapenko, Hans Boehm, Peter Zijlstra

On 09/17, Dmitry Vyukov wrote:
>
> What happens here exactly matches what is described in CONTROL
> DEPENDENCIES section of Documentation/memory-barriers.txt. So all the
> bad things described there are possible here.

And I still can't understand how these bad things connect to put_pid().
Probably I should re-read memory-barriers.txt, it changes quite often.

> I don't
> know what to add to that.

OK, let me quote the parts of your changelog,

	For example, if store to the first word of the object to build a freelist
	in kmem_cache_free() hoists above the check, stores to the first word
	in other threads can corrupt the memory allocator freelist.

I simply can't parse this. Yes, this is probably because of my bad
English, but I'll appreciate it if you can explain at least, say,
"stores to the first word in other threads".

Did you mean that a freed pid can be reallocated by another thread,
then overwritten, and this all can happen before atomic_read(count)?


Hmm. or perhaps you meant that the "last" put_pid() which observes
atomic_read() == 1 can race with another thread which writes to this
pid and does put_pid()? This is another story, and if you meant this
the changelog could clearly explain your concerns.

Or what?


So let me repeat. Since I can't understand you, I leave this to other
reviewers. But imho the changelog should be updated in any case.

Oleg.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-17 16:41   ` Dmitry Vyukov
  2015-09-17 17:44     ` Oleg Nesterov
@ 2015-09-17 17:56     ` Paul E. McKenney
  1 sibling, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2015-09-17 17:56 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Oleg Nesterov, ebiederm, Al Viro, Andrew Morton, Ingo Molnar,
	mhocko, LKML, ktsan, Kostya Serebryany, Andrey Konovalov,
	Alexander Potapenko, Hans Boehm, Peter Zijlstra

On Thu, Sep 17, 2015 at 06:41:46PM +0200, Dmitry Vyukov wrote:
> What happens here exactly matches what is described in CONTROL
> DEPENDENCIES section of Documentation/memory-barriers.txt. So all the
> bad things described there are possible here. The document explicitly
> requires usage of rmb/acquire/READ_ONCE_CTRL in such cases. I don't
> know what to add to that.

I suggest explicitly listing the steps leading to a specific instance
of the failure.  This isn't always easy to do, but it can be very helpful.

> Regarding reordering of "ns = pid->numbers[pid->level].ns". If we are
> talking about the thread that releases the last reference via the
> fast-path atomic_read check, then, yes, they can be reordered by both
> compiler and hardware, but only in a non-observable way, and so it
> does not matter. Only in a non-observable way because the freeing
> thread is the same as the thread that does "ns =
> pid->numbers[pid->level].ns" and both compilers and hardware visibly
> preserve program order for single-thread.

Agreed.  If the compiler or the CPU did that reordering in a way that was
visible, that would be a bug even in single-threaded code.  Therefore,
if that reordering matters, both the compiler and the CPU are forbidden
from doing it.

> If we are talking about threads that release all but last reference,
> then, no, they can't be reordered, because those threads execute
> atomic_dec_and_test which is a release operation.

Also agreed.

							Thanx, Paul

> On Thu, Sep 17, 2015 at 6:08 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> > Honestly, I can not see how this can happen. So I do not really
> > understand the problem and the fix.
> >
> > And if this can happen I can't understand how this patch can help.
> > What about "ns = pid->numbers[pid->level].ns" ? It can be reordered
> > with atomic_read_acquire().
> >
> > I leave this to other reviewers, but perhaps you can spell the
> > "For example" part of the changelog.
> >
> >
> > On 09/17, Dmitry Vyukov wrote:
> >>
> >> put_pid checks whether the current thread has the only reference
> >> to the pid with atomic_read() which does not have any memory
> >> barriers, and if so proceeds directly to kmem_cache_free().
> >> As the result memory accesses to the object in kmem_cache_free()
> >> or user accesses to the object after reallocation (again without
> >> any memory barriers on fast path) can hoist above the atomic_read()
> >> check and conflict with memory accesses to the pid object in other
> >> threads before they released their references.
> >>
> >> There is a control dependency between the atomic_read() check and
> >> kmem_cache_free(), but control dependencies are disregarded by some
> >> architectures. Documentation/memory-barriers.txt explicitly states:
> >> "A load-load control dependency requires a full read memory barrier.
> >> ... please note that READ_ONCE_CTRL() is not optional! [even for stores]"
> >> in the CONTROL DEPENDENCIES section.
> >>
> >> For example, if store to the first word of the object to build a freelist
> >> in kmem_cache_free() hoists above the check, stores to the first word
> >> in other threads can corrupt the memory allocator freelist.
> >>
> >> Use atomic_read_acquire() for the fast path check to hand off properly
> >> acquired object to memory allocator.
> >>
> >> The data race was found with KernelThreadSanitizer (KTSAN).
> >>
> >> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> >> ---
> >>  kernel/pid.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/pid.c b/kernel/pid.c
> >> index ca36879..3b0b13d 100644
> >> --- a/kernel/pid.c
> >> +++ b/kernel/pid.c
> >> @@ -242,7 +242,7 @@ void put_pid(struct pid *pid)
> >>               return;
> >>
> >>       ns = pid->numbers[pid->level].ns;
> >> -     if ((atomic_read(&pid->count) == 1) ||
> >> +     if ((atomic_read_acquire(&pid->count) == 1) ||
> >>            atomic_dec_and_test(&pid->count)) {
> >>               kmem_cache_free(ns->pid_cachep, pid);
> >>               put_pid_ns(ns);
> >> --
> >> 2.6.0.rc0.131.gf624c3d
> >>
> >
> 
> 
> 
> -- 
> Dmitry Vyukov, Software Engineer, dvyukov@google.com
> Google Germany GmbH, Dienerstraße 12, 80331, München
> Geschäftsführer: Graham Law, Christine Elizabeth Flores
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
> sind, leiten Sie diese bitte nicht weiter, informieren Sie den
> Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
> This e-mail is confidential. If you are not the right addressee please
> do not forward it, please inform the sender, and please erase this
> e-mail including any attachments. Thanks.
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-17 17:44     ` Oleg Nesterov
@ 2015-09-17 17:57       ` Dmitry Vyukov
  2015-09-17 17:59         ` Dmitry Vyukov
  2015-09-17 18:09         ` Oleg Nesterov
  0 siblings, 2 replies; 40+ messages in thread
From: Dmitry Vyukov @ 2015-09-17 17:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ebiederm, Al Viro, Andrew Morton, Ingo Molnar, Paul McKenney,
	mhocko, LKML, ktsan, Kostya Serebryany, Andrey Konovalov,
	Alexander Potapenko, Hans Boehm, Peter Zijlstra

I can update the patch description, but let me explain it here first.

Here is the essence of what happens:

// thread 1
1: pid->foo = 1; // foo is the first word of pid object
// then it does put_pid
2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
returns false so the function returns

// thread 2
// executes put_pid
3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
// then kmem_cache_free does:
5: head->freelist = (void*)pid;

This can be executed as:

4: *(void**)pid = head->freelist;
1: pid->foo = 1; // foo is the first word of pid object
2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
returns false so the function returns
3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
5: head->freelist = (void*)pid;


And we get corrupted allocator freelist.



On Thu, Sep 17, 2015 at 7:44 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 09/17, Dmitry Vyukov wrote:
>>
>> What happens here exactly matches what is described in CONTROL
>> DEPENDENCIES section of Documentation/memory-barriers.txt. So all the
>> bad things described there are possible here.
>
> And I still can't understand how these bad things connect to put_pid().
> Probably I should re-read memory-barriers.txt, it changes quite often.
>
>> I don't
>> know what to add to that.
>
> OK, let me quote the parts of your changelog,
>
>         For example, if store to the first word of the object to build a freelist
>         in kmem_cache_free() hoists above the check, stores to the first word
>         in other threads can corrupt the memory allocator freelist.
>
> I simply can't parse this. Yes, this is probably because of my bad
> English, but I'll appreciate it if you can explain at least, say,
> "stores to the first word in other threads".
>
> Did you mean that a freed pid can be reallocated by another thread,
> then overwritten, and this all can happen before atomic_read(count)?
>
>
> Hmm. or perhaps you meant that the "last" put_pid() which observes
> atomic_read() == 1 can race with another thread which writes to this
> pid and does put_pid()? This is another story, and if you meant this
> the changelog could clearly explain your concerns.
>
> Or what?
>
>
> So let me repeat. Since I can't understand you, I leave this to other
> reviewers. But imho the changelog should be updated in any case.
>
> Oleg.
>



-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-17 17:57       ` Dmitry Vyukov
@ 2015-09-17 17:59         ` Dmitry Vyukov
  2015-09-17 18:09         ` Oleg Nesterov
  1 sibling, 0 replies; 40+ messages in thread
From: Dmitry Vyukov @ 2015-09-17 17:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ebiederm, Al Viro, Andrew Morton, Ingo Molnar, Paul McKenney,
	mhocko, LKML, ktsan, Kostya Serebryany, Andrey Konovalov,
	Alexander Potapenko, Hans Boehm, Peter Zijlstra

On Thu, Sep 17, 2015 at 7:57 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> I can update the patch description, but let me explain it here first.
>
> Here is the essence of what happens:
>
> // thread 1
> 1: pid->foo = 1; // foo is the first word of pid object
> // then it does put_pid
> 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
> returns false so the function returns
>
> // thread 2
> // executes put_pid
> 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
> // then kmem_cache_free does:

Oh, sorry, I messed it.
4: *(void**)pid = head->freelist;
goes here.

> 5: head->freelist = (void*)pid;
>
> This can be executed as:
>
> 4: *(void**)pid = head->freelist;
> 1: pid->foo = 1; // foo is the first word of pid object
> 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
> returns false so the function returns
> 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
> 5: head->freelist = (void*)pid;
>
>
> And we get corrupted allocator freelist.

This scenario can be extended to all other words of pid object,
because memory allocator can generally write to all words of the freed
object (e.g. write 0xdeaddead into all words and then crash
discovering 0x1 in some word).

> On Thu, Sep 17, 2015 at 7:44 PM, Oleg Nesterov <oleg@redhat.com> wrote:
>> On 09/17, Dmitry Vyukov wrote:
>>>
>>> What happens here exactly matches what is described in CONTROL
>>> DEPENDENCIES section of Documentation/memory-barriers.txt. So all the
>>> bad things described there are possible here.
>>
>> And I still can't understand how these bad things connect to put_pid().
>> Probably I should re-read memory-barriers.txt, it changes quite often.
>>
>>> I don't
>>> know what to add to that.
>>
>> OK, let me quote the parts of your changelog,
>>
>>         For example, if store to the first word of the object to build a freelist
>>         in kmem_cache_free() hoists above the check, stores to the first word
>>         in other threads can corrupt the memory allocator freelist.
>>
>> I simply can't parse this. Yes, this is probably because of my bad
>> English, but I'll appreciate it if you can explain at least, say,
>> "stores to the first word in other threads".
>>
>> Did you mean that a freed pid can be reallocated by another thread,
>> then overwritten, and this all can happen before atomic_read(count)?
>>
>>
>> Hmm. or perhaps you meant that the "last" put_pid() which observes
>> atomic_read() == 1 can race with another thread which writes to this
>> pid and does put_pid()? This is another story, and if you meant this
>> the changelog could clearly explain your concerns.
>>
>> Or what?
>>
>>
>> So let me repeat. Since I can't understand you, I leave this to other
>> reviewers. But imho the changelog should be updated in any case.
>>
>> Oleg.
>>
>
>
>
> --
> Dmitry Vyukov, Software Engineer, dvyukov@google.com
> Google Germany GmbH, Dienerstraße 12, 80331, München
> Geschäftsführer: Graham Law, Christine Elizabeth Flores
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
> sind, leiten Sie diese bitte nicht weiter, informieren Sie den
> Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
> This e-mail is confidential. If you are not the right addressee please
> do not forward it, please inform the sender, and please erase this
> e-mail including any attachments. Thanks.



-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-17 17:57       ` Dmitry Vyukov
  2015-09-17 17:59         ` Dmitry Vyukov
@ 2015-09-17 18:09         ` Oleg Nesterov
  2015-09-17 18:38           ` Dmitry Vyukov
  2015-09-18  8:51           ` Peter Zijlstra
  1 sibling, 2 replies; 40+ messages in thread
From: Oleg Nesterov @ 2015-09-17 18:09 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: ebiederm, Al Viro, Andrew Morton, Ingo Molnar, Paul McKenney,
	mhocko, LKML, ktsan, Kostya Serebryany, Andrey Konovalov,
	Alexander Potapenko, Hans Boehm, Peter Zijlstra

On 09/17, Dmitry Vyukov wrote:
>
> I can update the patch description, but let me explain it here first.

Yes thanks.

> Here is the essence of what happens:

Aha, so you really meant that 2 put_pid's can race with each other,

> // thread 1
> 1: pid->foo = 1; // foo is the first word of pid object
> // then it does put_pid
> 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
> returns false so the function returns
>
> // thread 2
> // executes put_pid
> 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
> // then kmem_cache_free does:
> 5: head->freelist = (void*)pid;
>
> This can be executed as:
>
> 4: *(void**)pid = head->freelist;
> 1: pid->foo = 1; // foo is the first word of pid object
> 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
> returns false so the function returns
> 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
> 5: head->freelist = (void*)pid;

Unless I am totally confused, everything is simpler. We can forget
about the hoisting, freelist, etc.

Thread 2 can see the result of atomic_dec_and_test(), but not the
result of "pid->foo = 1". In this case in can free the object which
can be re-allocated _before_ STORE(pid->foo) completes. Of course,
this would be really bad.

I need to recheck, but afaics this is not possible. This optimization
is fine, but probably needs a comment. We rely on delayed_put_pid()
called by RCU. And note that nobody can write to this pid after it
is removed from the rcu-protected list.

So I think this is false alarm, but I'll try to recheck tomorrow, it
is too late for me today.

Oleg.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-17 18:09         ` Oleg Nesterov
@ 2015-09-17 18:38           ` Dmitry Vyukov
  2015-09-18  8:51           ` Peter Zijlstra
  1 sibling, 0 replies; 40+ messages in thread
From: Dmitry Vyukov @ 2015-09-17 18:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ebiederm, Al Viro, Andrew Morton, Ingo Molnar, Paul McKenney,
	mhocko, LKML, ktsan, Kostya Serebryany, Andrey Konovalov,
	Alexander Potapenko, Hans Boehm, Peter Zijlstra

On Thu, Sep 17, 2015 at 8:09 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 09/17, Dmitry Vyukov wrote:
>>
>> I can update the patch description, but let me explain it here first.
>
> Yes thanks.
>
>> Here is the essence of what happens:
>
> Aha, so you really meant that 2 put_pid's can race with each other,
>
>> // thread 1
>> 1: pid->foo = 1; // foo is the first word of pid object
>> // then it does put_pid
>> 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
>> returns false so the function returns
>>
>> // thread 2
>> // executes put_pid
>> 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
>> // then kmem_cache_free does:
>> 5: head->freelist = (void*)pid;
>>
>> This can be executed as:
>>
>> 4: *(void**)pid = head->freelist;
>> 1: pid->foo = 1; // foo is the first word of pid object
>> 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
>> returns false so the function returns
>> 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
>> 5: head->freelist = (void*)pid;
>
> Unless I am totally confused, everything is simpler. We can forget
> about the hoisting, freelist, etc.
>
> Thread 2 can see the result of atomic_dec_and_test(), but not the
> result of "pid->foo = 1". In this case in can free the object which
> can be re-allocated _before_ STORE(pid->foo) completes. Of course,
> this would be really bad.


Yes, that's what I mean.
A missed memory barrier can break in lots of magical, complex ways. So
I generally prefer to not think about concrete scenarios. Passing a
non-acquired object to kfree is a data race.


> I need to recheck, but afaics this is not possible. This optimization
> is fine, but probably needs a comment. We rely on delayed_put_pid()
> called by RCU. And note that nobody can write to this pid after it
> is removed from the rcu-protected list.
>
> So I think this is false alarm, but I'll try to recheck tomorrow, it
> is too late for me today.

Well, if that would be true, then put_pid would not contain any atomic
operations.
Here is the report from KTSAN that I observe:

    ThreadSanitizer: data-race in kt_memblock_free

    Write of size 8 by thread T107 (K630):
     [<ffffffff812499ef>] kt_memblock_free+0xdf/0x150 mm/ktsan/memblock.c:90
     [<ffffffff81249704>] ktsan_memblock_free+0xc4/0xf0
mm/ktsan/ktsan.c:251 (discriminator 6)
     [<     inlined    >] kmem_cache_free+0x99/0x610 __cache_free mm/slab.c:3383
     [<ffffffff81239149>] kmem_cache_free+0x99/0x610 mm/slab.c:3561
     [<ffffffff810b4095>] put_pid+0x85/0xa0 kernel/pid.c:247
     [<ffffffff810b40ce>] delayed_put_pid+0x1e/0x30 kernel/pid.c:256
     [<     inlined    >] rcu_process_callbacks+0x410/0xa70
__rcu_reclaim kernel/rcu/rcu.h:118
     [<     inlined    >] rcu_process_callbacks+0x410/0xa70
rcu_do_batch kernel/rcu/tree.c:2669
     [<     inlined    >] rcu_process_callbacks+0x410/0xa70
invoke_rcu_callbacks kernel/rcu/tree.c:2937
     [<     inlined    >] rcu_process_callbacks+0x410/0xa70
__rcu_process_callbacks kernel/rcu/tree.c:2904
     [<ffffffff811044d0>] rcu_process_callbacks+0x410/0xa70
kernel/rcu/tree.c:2921
     [<ffffffff8108f18d>] __do_softirq+0xad/0x2d0 kernel/softirq.c:273
     [<     inlined    >] irq_exit+0x98/0xa0 invoke_softirq kernel/softirq.c:350
     [<ffffffff8108f528>] irq_exit+0x98/0xa0 kernel/softirq.c:391
     [<     inlined    >] smp_apic_timer_interrupt+0x63/0x80
exiting_irq ./arch/x86/include/asm/apic.h:655
     [<ffffffff8105cd33>] smp_apic_timer_interrupt+0x63/0x80
arch/x86/kernel/apic/apic.c:915
     [<ffffffff81e9661b>] apic_timer_interrupt+0x6b/0x70
arch/x86/entry/entry_64.S:782
     [<     inlined    >] complete+0x41/0x50 spin_unlock_irqrestore
include/linux/spinlock.h:372
     [<ffffffff810d90b1>] complete+0x41/0x50 kernel/sched/completion.c:36
     [<ffffffff810b7f11>] kthread+0x131/0x180 kernel/kthread.c:200
     [<ffffffff81e95bdf>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
    DBG: cpu = ffff88063fc1fe68
    DBG: cpu id = 0

    Previous read of size 8 by thread T28 (K25):
     [<ffffffff810b4043>] put_pid+0x33/0xa0 kernel/pid.c:244
     [<ffffffff810874a8>] _do_fork+0x1a8/0x550 kernel/fork.c:1746
     [<ffffffff8108788c>] kernel_thread+0x3c/0x60 kernel/fork.c:1772
     [<ffffffff810a7e1c>] __call_usermodehelper+0x5c/0x90 kernel/kmod.c:317
     [<ffffffff810aebed>] process_one_work+0x2ad/0x6f0 kernel/workqueue.c:2036
     [<ffffffff810af769>] worker_thread+0xb9/0x730 kernel/workqueue.c:2170
     [<ffffffff810b7f41>] kthread+0x161/0x180 kernel/kthread.c:207
     [<ffffffff81e95bdf>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526


Once put_pid indeed comes from rcu_process_callbacks, but another
comes from _do_fork and it is not in an rcu read critical section.


-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-17 18:09         ` Oleg Nesterov
  2015-09-17 18:38           ` Dmitry Vyukov
@ 2015-09-18  8:51           ` Peter Zijlstra
  2015-09-18  8:57             ` Peter Zijlstra
                               ` (2 more replies)
  1 sibling, 3 replies; 40+ messages in thread
From: Peter Zijlstra @ 2015-09-18  8:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, ebiederm, Al Viro, Andrew Morton, Ingo Molnar,
	Paul McKenney, mhocko, LKML, ktsan, Kostya Serebryany,
	Andrey Konovalov, Alexander Potapenko, Hans Boehm

On Thu, Sep 17, 2015 at 08:09:19PM +0200, Oleg Nesterov wrote:
> On 09/17, Dmitry Vyukov wrote:
> >
> > I can update the patch description, but let me explain it here first.
> 
> Yes thanks.
> 
> > Here is the essence of what happens:
> 
> Aha, so you really meant that 2 put_pid's can race with each other,
> 
> > // thread 1
> > 1: pid->foo = 1; // foo is the first word of pid object
> > // then it does put_pid
> > 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
> > returns false so the function returns
> >
> > // thread 2
> > // executes put_pid
> > 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
> > // then kmem_cache_free does:
> > 4: *(void**)pid = head->freelist;
> > 5: head->freelist = (void*)pid;
> >
> > This can be executed as:
> >
> > 4: *(void**)pid = head->freelist;
> > 1: pid->foo = 1; // foo is the first word of pid object
> > 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
> > returns false so the function returns
> > 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
> > 5: head->freelist = (void*)pid;
> 
> Unless I am totally confused, everything is simpler. We can forget
> about the hoisting, freelist, etc.
> 
> Thread 2 can see the result of atomic_dec_and_test(), but not the
> result of "pid->foo = 1". In this case in can free the object which
> can be re-allocated _before_ STORE(pid->foo) completes. Of course,
> this would be really bad.
> 
> I need to recheck, but afaics this is not possible. This optimization
> is fine, but probably needs a comment.

For sure, this code doesn't make any sense to me.

> We rely on delayed_put_pid()
> called by RCU. And note that nobody can write to this pid after it
> is removed from the rcu-protected list.
> 
> So I think this is false alarm, but I'll try to recheck tomorrow, it
> is too late for me today.

As an alternative patch, could we not do:

  void put_pid(struct pid *pid)
  {
	struct pid_namespace *ns;

	if (!pid)
		return;

	ns = pid->numbers[pid->level].ns;
	if ((atomic_read(&pid->count) == 1) ||
	     atomic_dec_and_test(&pid->count)) {

+		smp_read_barrier_depends(); /* ctrl-dep */

		kmem_cache_free(ns->pid_cachep, pid);
		put_pid_ns(ns);
	}
  }

That would upgrade the atomic_read() path to a full READ_ONCE_CTRL(),
and thereby avoid any of the kmem_cache_free() stores from leaking out.
And its free, except on Alpha. Whereas the atomic_read_acquire() will
generate a full memory barrier on whole bunch of archs.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18  8:51           ` Peter Zijlstra
@ 2015-09-18  8:57             ` Peter Zijlstra
  2015-09-18  9:27               ` Peter Zijlstra
  2015-09-18 15:56               ` Paul E. McKenney
  2015-09-18  9:06             ` Dmitry Vyukov
  2015-09-18 13:28             ` Oleg Nesterov
  2 siblings, 2 replies; 40+ messages in thread
From: Peter Zijlstra @ 2015-09-18  8:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, ebiederm, Al Viro, Andrew Morton, Ingo Molnar,
	Paul McKenney, mhocko, LKML, ktsan, Kostya Serebryany,
	Andrey Konovalov, Alexander Potapenko, Hans Boehm

On Fri, Sep 18, 2015 at 10:51:56AM +0200, Peter Zijlstra wrote:
> That would upgrade the atomic_read() path to a full READ_ONCE_CTRL(),
> and thereby avoid any of the kmem_cache_free() stores from leaking out.

That said, on my TODO list is an item to review all atomic_{read,set}()
implementation to ensure they're (at least) {READ,WRITE}_ONCE().

A quick scan left me with this. There were indeed a few archs lacking
READ_ONCE (or ACCESS_ONCE) for atomic_read() and hardly anybody used
WRITE_ONCE() (or ACCESS_ONCE casting) for atomic_set().

---
 arch/alpha/include/asm/atomic.h        | 8 ++++----
 arch/arc/include/asm/atomic.h          | 8 ++++----
 arch/arm/include/asm/atomic.h          | 4 ++--
 arch/arm64/include/asm/atomic.h        | 2 +-
 arch/avr32/include/asm/atomic.h        | 4 ++--
 arch/frv/include/asm/atomic.h          | 4 ++--
 arch/h8300/include/asm/atomic.h        | 4 ++--
 arch/hexagon/include/asm/atomic.h      | 2 +-
 arch/ia64/include/asm/atomic.h         | 8 ++++----
 arch/m32r/include/asm/atomic.h         | 4 ++--
 arch/m68k/include/asm/atomic.h         | 4 ++--
 arch/metag/include/asm/atomic_lnkget.h | 3 ++-
 arch/metag/include/asm/atomic_lock1.h  | 2 +-
 arch/mips/include/asm/atomic.h         | 8 ++++----
 arch/mn10300/include/asm/atomic.h      | 4 ++--
 arch/parisc/include/asm/atomic.h       | 2 +-
 arch/sh/include/asm/atomic.h           | 4 ++--
 arch/sparc/include/asm/atomic_64.h     | 8 ++++----
 arch/tile/include/asm/atomic.h         | 2 +-
 arch/tile/include/asm/atomic_64.h      | 6 +++---
 arch/x86/include/asm/atomic.h          | 4 ++--
 arch/x86/include/asm/atomic64_64.h     | 4 ++--
 arch/xtensa/include/asm/atomic.h       | 4 ++--
 include/asm-generic/atomic.h           | 4 ++--
 24 files changed, 54 insertions(+), 53 deletions(-)

diff --git a/arch/alpha/include/asm/atomic.h b/arch/alpha/include/asm/atomic.h
index e8c956098424..572b228c44c7 100644
--- a/arch/alpha/include/asm/atomic.h
+++ b/arch/alpha/include/asm/atomic.h
@@ -17,11 +17,11 @@
 #define ATOMIC_INIT(i)		{ (i) }
 #define ATOMIC64_INIT(i)	{ (i) }
 
-#define atomic_read(v)		ACCESS_ONCE((v)->counter)
-#define atomic64_read(v)	ACCESS_ONCE((v)->counter)
+#define atomic_read(v)		READ_ONCE((v)->counter)
+#define atomic64_read(v)	READ_ONCE((v)->counter)
 
-#define atomic_set(v,i)		((v)->counter = (i))
-#define atomic64_set(v,i)	((v)->counter = (i))
+#define atomic_set(v,i)		WRITE_ONCE((v)->counter, (i))
+#define atomic64_set(v,i)	WRITE_ONCE((v)->counter, (i))
 
 /*
  * To get proper branch prediction for the main line, we must branch
diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index c3ecda023e3a..7730d302cadb 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -17,11 +17,11 @@
 #include <asm/barrier.h>
 #include <asm/smp.h>
 
-#define atomic_read(v)  ((v)->counter)
+#define atomic_read(v)  READ_ONCE((v)->counter)
 
 #ifdef CONFIG_ARC_HAS_LLSC
 
-#define atomic_set(v, i) (((v)->counter) = (i))
+#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
 
 #ifdef CONFIG_ARC_STAR_9000923308
 
@@ -107,7 +107,7 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
 #ifndef CONFIG_SMP
 
  /* violating atomic_xxx API locking protocol in UP for optimization sake */
-#define atomic_set(v, i) (((v)->counter) = (i))
+#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
 
 #else
 
@@ -125,7 +125,7 @@ static inline void atomic_set(atomic_t *v, int i)
 	unsigned long flags;
 
 	atomic_ops_lock(flags);
-	v->counter = i;
+	WRITE_ONCE(v->counter, i);
 	atomic_ops_unlock(flags);
 }
 
diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
index fe3ef397f5a4..2bf80afb7841 100644
--- a/arch/arm/include/asm/atomic.h
+++ b/arch/arm/include/asm/atomic.h
@@ -27,8 +27,8 @@
  * strex/ldrex monitor on some implementations. The reason we can use it for
  * atomic_set() is the clrex or dummy strex done on every exception return.
  */
-#define atomic_read(v)	ACCESS_ONCE((v)->counter)
-#define atomic_set(v,i)	(((v)->counter) = (i))
+#define atomic_read(v)	READ_ONCE((v)->counter)
+#define atomic_set(v,i)	WRITE_ONCE(((v)->counter), (i))
 
 #if __LINUX_ARM_ARCH__ >= 6
 
diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index 35a67783cfa0..1e247ac2601a 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -54,7 +54,7 @@
 #define ATOMIC_INIT(i)	{ (i) }
 
 #define atomic_read(v)			READ_ONCE((v)->counter)
-#define atomic_set(v, i)		(((v)->counter) = (i))
+#define atomic_set(v, i)		WRITE_ONCE(((v)->counter), (i))
 #define atomic_xchg(v, new)		xchg(&((v)->counter), (new))
 #define atomic_cmpxchg(v, old, new)	cmpxchg(&((v)->counter), (old), (new))
 
diff --git a/arch/avr32/include/asm/atomic.h b/arch/avr32/include/asm/atomic.h
index 97c9bdf83409..d74fd8ce980a 100644
--- a/arch/avr32/include/asm/atomic.h
+++ b/arch/avr32/include/asm/atomic.h
@@ -19,8 +19,8 @@
 
 #define ATOMIC_INIT(i)  { (i) }
 
-#define atomic_read(v)		ACCESS_ONCE((v)->counter)
-#define atomic_set(v, i)	(((v)->counter) = i)
+#define atomic_read(v)		READ_ONCE((v)->counter)
+#define atomic_set(v, i)	WRITE_ONCE(((v)->counter), (i))
 
 #define ATOMIC_OP_RETURN(op, asm_op, asm_con)				\
 static inline int __atomic_##op##_return(int i, atomic_t *v)		\
diff --git a/arch/frv/include/asm/atomic.h b/arch/frv/include/asm/atomic.h
index 0da689def4cc..64f02d451aa8 100644
--- a/arch/frv/include/asm/atomic.h
+++ b/arch/frv/include/asm/atomic.h
@@ -32,8 +32,8 @@
  */
 
 #define ATOMIC_INIT(i)		{ (i) }
-#define atomic_read(v)		ACCESS_ONCE((v)->counter)
-#define atomic_set(v, i)	(((v)->counter) = (i))
+#define atomic_read(v)		READ_ONCE((v)->counter)
+#define atomic_set(v, i)	WRITE_ONCE(((v)->counter), (i))
 
 static inline int atomic_inc_return(atomic_t *v)
 {
diff --git a/arch/h8300/include/asm/atomic.h b/arch/h8300/include/asm/atomic.h
index 702ee539f87d..4435a445ae7e 100644
--- a/arch/h8300/include/asm/atomic.h
+++ b/arch/h8300/include/asm/atomic.h
@@ -11,8 +11,8 @@
 
 #define ATOMIC_INIT(i)	{ (i) }
 
-#define atomic_read(v)		ACCESS_ONCE((v)->counter)
-#define atomic_set(v, i)	(((v)->counter) = i)
+#define atomic_read(v)		READ_ONCE((v)->counter)
+#define atomic_set(v, i)	WRITE_ONCE(((v)->counter), (i))
 
 #include <linux/kernel.h>
 
diff --git a/arch/hexagon/include/asm/atomic.h b/arch/hexagon/include/asm/atomic.h
index 811d61f6422d..55696c4100d4 100644
--- a/arch/hexagon/include/asm/atomic.h
+++ b/arch/hexagon/include/asm/atomic.h
@@ -48,7 +48,7 @@ static inline void atomic_set(atomic_t *v, int new)
  *
  * Assumes all word reads on our architecture are atomic.
  */
-#define atomic_read(v)		((v)->counter)
+#define atomic_read(v)		READ_ONCE((v)->counter)
 
 /**
  * atomic_xchg - atomic
diff --git a/arch/ia64/include/asm/atomic.h b/arch/ia64/include/asm/atomic.h
index be4beeb77d57..8dfb5f6f6c35 100644
--- a/arch/ia64/include/asm/atomic.h
+++ b/arch/ia64/include/asm/atomic.h
@@ -21,11 +21,11 @@
 #define ATOMIC_INIT(i)		{ (i) }
 #define ATOMIC64_INIT(i)	{ (i) }
 
-#define atomic_read(v)		ACCESS_ONCE((v)->counter)
-#define atomic64_read(v)	ACCESS_ONCE((v)->counter)
+#define atomic_read(v)		READ_ONCE((v)->counter)
+#define atomic64_read(v)	READ_ONCE((v)->counter)
 
-#define atomic_set(v,i)		(((v)->counter) = (i))
-#define atomic64_set(v,i)	(((v)->counter) = (i))
+#define atomic_set(v,i)		WRITE_ONCE(((v)->counter), (i))
+#define atomic64_set(v,i)	WRITE_ONCE(((v)->counter), (i))
 
 #define ATOMIC_OP(op, c_op)						\
 static __inline__ int							\
diff --git a/arch/m32r/include/asm/atomic.h b/arch/m32r/include/asm/atomic.h
index 025e2a170493..ea35160d632b 100644
--- a/arch/m32r/include/asm/atomic.h
+++ b/arch/m32r/include/asm/atomic.h
@@ -28,7 +28,7 @@
  *
  * Atomically reads the value of @v.
  */
-#define atomic_read(v)	ACCESS_ONCE((v)->counter)
+#define atomic_read(v)	READ_ONCE((v)->counter)
 
 /**
  * atomic_set - set atomic variable
@@ -37,7 +37,7 @@
  *
  * Atomically sets the value of @v to @i.
  */
-#define atomic_set(v,i)	(((v)->counter) = (i))
+#define atomic_set(v,i)	WRITE_ONCE(((v)->counter), (i))
 
 #ifdef CONFIG_CHIP_M32700_TS1
 #define __ATOMIC_CLOBBER	, "r4"
diff --git a/arch/m68k/include/asm/atomic.h b/arch/m68k/include/asm/atomic.h
index 039fac120cc0..4858178260f9 100644
--- a/arch/m68k/include/asm/atomic.h
+++ b/arch/m68k/include/asm/atomic.h
@@ -17,8 +17,8 @@
 
 #define ATOMIC_INIT(i)	{ (i) }
 
-#define atomic_read(v)		ACCESS_ONCE((v)->counter)
-#define atomic_set(v, i)	(((v)->counter) = i)
+#define atomic_read(v)		READ_ONCE((v)->counter)
+#define atomic_set(v, i)	WRITE_ONCE(((v)->counter), (i))
 
 /*
  * The ColdFire parts cannot do some immediate to memory operations,
diff --git a/arch/metag/include/asm/atomic_lnkget.h b/arch/metag/include/asm/atomic_lnkget.h
index 21c4c268b86c..1bd21c933435 100644
--- a/arch/metag/include/asm/atomic_lnkget.h
+++ b/arch/metag/include/asm/atomic_lnkget.h
@@ -3,7 +3,8 @@
 
 #define ATOMIC_INIT(i)	{ (i) }
 
-#define atomic_set(v, i)		((v)->counter = (i))
+/* XXX: should be LNKSETD ? */
+#define atomic_set(v, i)		WRITE_ONCE((v)->counter, (i))
 
 #include <linux/compiler.h>
 
diff --git a/arch/metag/include/asm/atomic_lock1.h b/arch/metag/include/asm/atomic_lock1.h
index f8efe380fe8b..0295d9b8d5bf 100644
--- a/arch/metag/include/asm/atomic_lock1.h
+++ b/arch/metag/include/asm/atomic_lock1.h
@@ -10,7 +10,7 @@
 
 static inline int atomic_read(const atomic_t *v)
 {
-	return (v)->counter;
+	return READ_ONCE((v)->counter);
 }
 
 /*
diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 4c42fd9af777..f82d3af07931 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -30,7 +30,7 @@
  *
  * Atomically reads the value of @v.
  */
-#define atomic_read(v)		ACCESS_ONCE((v)->counter)
+#define atomic_read(v)		READ_ONCE((v)->counter)
 
 /*
  * atomic_set - set atomic variable
@@ -39,7 +39,7 @@
  *
  * Atomically sets the value of @v to @i.
  */
-#define atomic_set(v, i)		((v)->counter = (i))
+#define atomic_set(v, i)	WRITE_ONCE((v)->counter, (i))
 
 #define ATOMIC_OP(op, c_op, asm_op)					      \
 static __inline__ void atomic_##op(int i, atomic_t * v)			      \
@@ -315,14 +315,14 @@ static __inline__ int __atomic_add_unless(atomic_t *v, int a, int u)
  * @v: pointer of type atomic64_t
  *
  */
-#define atomic64_read(v)	ACCESS_ONCE((v)->counter)
+#define atomic64_read(v)	READ_ONCE((v)->counter)
 
 /*
  * atomic64_set - set atomic variable
  * @v: pointer of type atomic64_t
  * @i: required value
  */
-#define atomic64_set(v, i)	((v)->counter = (i))
+#define atomic64_set(v, i)	WRITE_ONCE((v)->counter, (i))
 
 #define ATOMIC64_OP(op, c_op, asm_op)					      \
 static __inline__ void atomic64_##op(long i, atomic64_t * v)		      \
diff --git a/arch/mn10300/include/asm/atomic.h b/arch/mn10300/include/asm/atomic.h
index 375e59140c9c..ce318d5ab23b 100644
--- a/arch/mn10300/include/asm/atomic.h
+++ b/arch/mn10300/include/asm/atomic.h
@@ -34,7 +34,7 @@
  *
  * Atomically reads the value of @v.  Note that the guaranteed
  */
-#define atomic_read(v)	(ACCESS_ONCE((v)->counter))
+#define atomic_read(v)	READ_ONCE((v)->counter)
 
 /**
  * atomic_set - set atomic variable
@@ -43,7 +43,7 @@
  *
  * Atomically sets the value of @v to @i.  Note that the guaranteed
  */
-#define atomic_set(v, i) (((v)->counter) = (i))
+#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
 
 #define ATOMIC_OP(op)							\
 static inline void atomic_##op(int i, atomic_t *v)			\
diff --git a/arch/parisc/include/asm/atomic.h b/arch/parisc/include/asm/atomic.h
index 2536965d00ea..1d109990a022 100644
--- a/arch/parisc/include/asm/atomic.h
+++ b/arch/parisc/include/asm/atomic.h
@@ -67,7 +67,7 @@ static __inline__ void atomic_set(atomic_t *v, int i)
 
 static __inline__ int atomic_read(const atomic_t *v)
 {
-	return ACCESS_ONCE((v)->counter);
+	return READ_ONCE((v)->counter);
 }
 
 /* exported interface */
diff --git a/arch/sh/include/asm/atomic.h b/arch/sh/include/asm/atomic.h
index 05b9f74ce2d5..c399e1c55685 100644
--- a/arch/sh/include/asm/atomic.h
+++ b/arch/sh/include/asm/atomic.h
@@ -14,8 +14,8 @@
 
 #define ATOMIC_INIT(i)	{ (i) }
 
-#define atomic_read(v)		ACCESS_ONCE((v)->counter)
-#define atomic_set(v,i)		((v)->counter = (i))
+#define atomic_read(v)		READ_ONCE((v)->counter)
+#define atomic_set(v,i)		WRITE_ONCE((v)->counter, (i))
 
 #if defined(CONFIG_GUSA_RB)
 #include <asm/atomic-grb.h>
diff --git a/arch/sparc/include/asm/atomic_64.h b/arch/sparc/include/asm/atomic_64.h
index 917084ace49d..f2fbf9e16faf 100644
--- a/arch/sparc/include/asm/atomic_64.h
+++ b/arch/sparc/include/asm/atomic_64.h
@@ -14,11 +14,11 @@
 #define ATOMIC_INIT(i)		{ (i) }
 #define ATOMIC64_INIT(i)	{ (i) }
 
-#define atomic_read(v)		ACCESS_ONCE((v)->counter)
-#define atomic64_read(v)	ACCESS_ONCE((v)->counter)
+#define atomic_read(v)		READ_ONCE((v)->counter)
+#define atomic64_read(v)	READ_ONCE((v)->counter)
 
-#define atomic_set(v, i)	(((v)->counter) = i)
-#define atomic64_set(v, i)	(((v)->counter) = i)
+#define atomic_set(v, i)	WRITE_ONCE(((v)->counter), (i))
+#define atomic64_set(v, i)	WRITE_ONCE(((v)->counter), (i))
 
 #define ATOMIC_OP(op)							\
 void atomic_##op(int, atomic_t *);					\
diff --git a/arch/tile/include/asm/atomic.h b/arch/tile/include/asm/atomic.h
index 709798460763..9fc0107a9c5e 100644
--- a/arch/tile/include/asm/atomic.h
+++ b/arch/tile/include/asm/atomic.h
@@ -34,7 +34,7 @@
  */
 static inline int atomic_read(const atomic_t *v)
 {
-	return ACCESS_ONCE(v->counter);
+	return READ_ONCE(v->counter);
 }
 
 /**
diff --git a/arch/tile/include/asm/atomic_64.h b/arch/tile/include/asm/atomic_64.h
index 096a56d6ead4..51cabc26e387 100644
--- a/arch/tile/include/asm/atomic_64.h
+++ b/arch/tile/include/asm/atomic_64.h
@@ -24,7 +24,7 @@
 
 /* First, the 32-bit atomic ops that are "real" on our 64-bit platform. */
 
-#define atomic_set(v, i) ((v)->counter = (i))
+#define atomic_set(v, i) WRITE_ONCE((v)->counter, (i))
 
 /*
  * The smp_mb() operations throughout are to support the fact that
@@ -82,8 +82,8 @@ static inline void atomic_xor(int i, atomic_t *v)
 
 #define ATOMIC64_INIT(i)	{ (i) }
 
-#define atomic64_read(v)		((v)->counter)
-#define atomic64_set(v, i) ((v)->counter = (i))
+#define atomic64_read(v)	READ_ONCE((v)->counter)
+#define atomic64_set(v, i)	WRITE_ONCE((v)->counter, (i))
 
 static inline void atomic64_add(long i, atomic64_t *v)
 {
diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index fb52aa644aab..ae5fb83e6d91 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -24,7 +24,7 @@
  */
 static __always_inline int atomic_read(const atomic_t *v)
 {
-	return ACCESS_ONCE((v)->counter);
+	return READ_ONCE((v)->counter);
 }
 
 /**
@@ -36,7 +36,7 @@ static __always_inline int atomic_read(const atomic_t *v)
  */
 static __always_inline void atomic_set(atomic_t *v, int i)
 {
-	v->counter = i;
+	WRITE_ONCE(v->counter, i);
 }
 
 /**
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index 50e33eff58de..037351022f54 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -18,7 +18,7 @@
  */
 static inline long atomic64_read(const atomic64_t *v)
 {
-	return ACCESS_ONCE((v)->counter);
+	return READ_ONCE((v)->counter);
 }
 
 /**
@@ -30,7 +30,7 @@ static inline long atomic64_read(const atomic64_t *v)
  */
 static inline void atomic64_set(atomic64_t *v, long i)
 {
-	v->counter = i;
+	WRITE_ONCE(v->counter, i);
 }
 
 /**
diff --git a/arch/xtensa/include/asm/atomic.h b/arch/xtensa/include/asm/atomic.h
index 93795d047303..fd8017ce298a 100644
--- a/arch/xtensa/include/asm/atomic.h
+++ b/arch/xtensa/include/asm/atomic.h
@@ -47,7 +47,7 @@
  *
  * Atomically reads the value of @v.
  */
-#define atomic_read(v)		ACCESS_ONCE((v)->counter)
+#define atomic_read(v)		READ_ONCE((v)->counter)
 
 /**
  * atomic_set - set atomic variable
@@ -56,7 +56,7 @@
  *
  * Atomically sets the value of @v to @i.
  */
-#define atomic_set(v,i)		((v)->counter = (i))
+#define atomic_set(v,i)		WRITE_ONCE((v)->counter, (i))
 
 #if XCHAL_HAVE_S32C1I
 #define ATOMIC_OP(op)							\
diff --git a/include/asm-generic/atomic.h b/include/asm-generic/atomic.h
index d4d7e337fdcb..74f1a3704d7a 100644
--- a/include/asm-generic/atomic.h
+++ b/include/asm-generic/atomic.h
@@ -127,7 +127,7 @@ ATOMIC_OP(xor, ^)
  * Atomically reads the value of @v.
  */
 #ifndef atomic_read
-#define atomic_read(v)	ACCESS_ONCE((v)->counter)
+#define atomic_read(v)	READ_ONCE((v)->counter)
 #endif
 
 /**
@@ -137,7 +137,7 @@ ATOMIC_OP(xor, ^)
  *
  * Atomically sets the value of @v to @i.
  */
-#define atomic_set(v, i) (((v)->counter) = (i))
+#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
 
 #include <linux/irqflags.h>
 

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18  8:51           ` Peter Zijlstra
  2015-09-18  8:57             ` Peter Zijlstra
@ 2015-09-18  9:06             ` Dmitry Vyukov
  2015-09-18  9:28               ` Will Deacon
  2015-09-18 13:28             ` Oleg Nesterov
  2 siblings, 1 reply; 40+ messages in thread
From: Dmitry Vyukov @ 2015-09-18  9:06 UTC (permalink / raw)
  To: Peter Zijlstra, will.deacon
  Cc: Oleg Nesterov, ebiederm, Al Viro, Andrew Morton, Ingo Molnar,
	Paul McKenney, mhocko, LKML, ktsan, Kostya Serebryany,
	Andrey Konovalov, Alexander Potapenko, Hans Boehm

On Fri, Sep 18, 2015 at 10:51 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Sep 17, 2015 at 08:09:19PM +0200, Oleg Nesterov wrote:
>> On 09/17, Dmitry Vyukov wrote:
>> >
>> > I can update the patch description, but let me explain it here first.
>>
>> Yes thanks.
>>
>> > Here is the essence of what happens:
>>
>> Aha, so you really meant that 2 put_pid's can race with each other,
>>
>> > // thread 1
>> > 1: pid->foo = 1; // foo is the first word of pid object
>> > // then it does put_pid
>> > 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
>> > returns false so the function returns
>> >
>> > // thread 2
>> > // executes put_pid
>> > 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
>> > // then kmem_cache_free does:
>> > 4: *(void**)pid = head->freelist;
>> > 5: head->freelist = (void*)pid;
>> >
>> > This can be executed as:
>> >
>> > 4: *(void**)pid = head->freelist;
>> > 1: pid->foo = 1; // foo is the first word of pid object
>> > 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
>> > returns false so the function returns
>> > 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
>> > 5: head->freelist = (void*)pid;
>>
>> Unless I am totally confused, everything is simpler. We can forget
>> about the hoisting, freelist, etc.
>>
>> Thread 2 can see the result of atomic_dec_and_test(), but not the
>> result of "pid->foo = 1". In this case in can free the object which
>> can be re-allocated _before_ STORE(pid->foo) completes. Of course,
>> this would be really bad.
>>
>> I need to recheck, but afaics this is not possible. This optimization
>> is fine, but probably needs a comment.
>
> For sure, this code doesn't make any sense to me.
>
>> We rely on delayed_put_pid()
>> called by RCU. And note that nobody can write to this pid after it
>> is removed from the rcu-protected list.
>>
>> So I think this is false alarm, but I'll try to recheck tomorrow, it
>> is too late for me today.
>
> As an alternative patch, could we not do:
>
>   void put_pid(struct pid *pid)
>   {
>         struct pid_namespace *ns;
>
>         if (!pid)
>                 return;
>
>         ns = pid->numbers[pid->level].ns;
>         if ((atomic_read(&pid->count) == 1) ||
>              atomic_dec_and_test(&pid->count)) {
>
> +               smp_read_barrier_depends(); /* ctrl-dep */
>
>                 kmem_cache_free(ns->pid_cachep, pid);
>                 put_pid_ns(ns);
>         }
>   }
>
> That would upgrade the atomic_read() path to a full READ_ONCE_CTRL(),
> and thereby avoid any of the kmem_cache_free() stores from leaking out.
> And its free, except on Alpha. Whereas the atomic_read_acquire() will
> generate a full memory barrier on whole bunch of archs.


What you propose makes sense.

+Will, Paul

Can we have something along the lines of:

#define atomic_read_ctrl(v) READ_ONCE_CTRL(&(v)->counter)

then?

I've found a bunch of similar cases, e.g.:
https://groups.google.com/forum/#!topic/ktsan/YoU0yX2wQJU

They all would benefit from atomic_read_ctrl.




-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18  8:57             ` Peter Zijlstra
@ 2015-09-18  9:27               ` Peter Zijlstra
  2015-09-18 12:31                 ` James Hogan
  2015-09-18 15:56               ` Paul E. McKenney
  1 sibling, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2015-09-18  9:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, ebiederm, Al Viro, Andrew Morton, Ingo Molnar,
	Paul McKenney, mhocko, LKML, ktsan, Kostya Serebryany,
	Andrey Konovalov, Alexander Potapenko, Hans Boehm, james.hogan

On Fri, Sep 18, 2015 at 10:57:32AM +0200, Peter Zijlstra wrote:
> diff --git a/arch/metag/include/asm/atomic_lnkget.h b/arch/metag/include/asm/atomic_lnkget.h
> index 21c4c268b86c..1bd21c933435 100644
> --- a/arch/metag/include/asm/atomic_lnkget.h
> +++ b/arch/metag/include/asm/atomic_lnkget.h
> @@ -3,7 +3,8 @@
>  
>  #define ATOMIC_INIT(i)	{ (i) }
>  
> -#define atomic_set(v, i)		((v)->counter = (i))
> +/* XXX: should be LNKSETD ? */
> +#define atomic_set(v, i)		WRITE_ONCE((v)->counter, (i))
>  
>  #include <linux/compiler.h>
>  

James?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18  9:06             ` Dmitry Vyukov
@ 2015-09-18  9:28               ` Will Deacon
  2015-09-18  9:33                 ` Peter Zijlstra
                                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Will Deacon @ 2015-09-18  9:28 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Peter Zijlstra, Oleg Nesterov, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, Sep 18, 2015 at 10:06:46AM +0100, Dmitry Vyukov wrote:
> On Fri, Sep 18, 2015 at 10:51 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > As an alternative patch, could we not do:
> >
> >   void put_pid(struct pid *pid)
> >   {
> >         struct pid_namespace *ns;
> >
> >         if (!pid)
> >                 return;
> >
> >         ns = pid->numbers[pid->level].ns;
> >         if ((atomic_read(&pid->count) == 1) ||
> >              atomic_dec_and_test(&pid->count)) {
> >
> > +               smp_read_barrier_depends(); /* ctrl-dep */
> >
> >                 kmem_cache_free(ns->pid_cachep, pid);
> >                 put_pid_ns(ns);
> >         }
> >   }
> >
> > That would upgrade the atomic_read() path to a full READ_ONCE_CTRL(),
> > and thereby avoid any of the kmem_cache_free() stores from leaking out.
> > And its free, except on Alpha. Whereas the atomic_read_acquire() will
> > generate a full memory barrier on whole bunch of archs.
> 
> 
> What you propose makes sense.
> 
> +Will, Paul
> 
> Can we have something along the lines of:
> 
> #define atomic_read_ctrl(v) READ_ONCE_CTRL(&(v)->counter)

Funnily enough, I had this exact same discussion off-list yesterday
afternoon, since I wrote some code relying on a ctrl dependency from
an atomic_read to an atomic_xchg_relaxed.

So I guess I'm for the addition, but at the same time, could we make
atomic_read and atomic_set generic too?

Will

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18  9:28               ` Will Deacon
@ 2015-09-18  9:33                 ` Peter Zijlstra
  2015-09-18 11:22                 ` Peter Zijlstra
  2015-09-18 15:57                 ` Paul E. McKenney
  2 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2015-09-18  9:33 UTC (permalink / raw)
  To: Will Deacon
  Cc: Dmitry Vyukov, Oleg Nesterov, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, Sep 18, 2015 at 10:28:20AM +0100, Will Deacon wrote:
> > #define atomic_read_ctrl(v) READ_ONCE_CTRL(&(v)->counter)
> 
> Funnily enough, I had this exact same discussion off-list yesterday
> afternoon, since I wrote some code relying on a ctrl dependency from
> an atomic_read to an atomic_xchg_relaxed.
> 
> So I guess I'm for the addition, but at the same time, could we make
> atomic_read and atomic_set generic too?

Nope (having just gone through them), there's a few archs that implement
them in asm or even outright function calls (see blackfin, metag,
powerpc, s390).

That said, the patch I just send should see them all be at least
{READ,WRITE}_ONCE, the asm one obviously qualify for that etc..



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18  9:28               ` Will Deacon
  2015-09-18  9:33                 ` Peter Zijlstra
@ 2015-09-18 11:22                 ` Peter Zijlstra
  2015-09-18 11:30                   ` Will Deacon
                                     ` (2 more replies)
  2015-09-18 15:57                 ` Paul E. McKenney
  2 siblings, 3 replies; 40+ messages in thread
From: Peter Zijlstra @ 2015-09-18 11:22 UTC (permalink / raw)
  To: Will Deacon
  Cc: Dmitry Vyukov, Oleg Nesterov, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, Sep 18, 2015 at 10:28:20AM +0100, Will Deacon wrote:
> On Fri, Sep 18, 2015 at 10:06:46AM +0100, Dmitry Vyukov wrote:

> > Can we have something along the lines of:
> > 
> > #define atomic_read_ctrl(v) READ_ONCE_CTRL(&(v)->counter)
> 
> Funnily enough, I had this exact same discussion off-list yesterday
> afternoon, since I wrote some code relying on a ctrl dependency from
> an atomic_read to an atomic_xchg_relaxed.

Something a little like so should work fine I suppose.

---
Subject: atomic: Implement atomic_read_ctrl()

Provide atomic_read_ctrl() to mirror READ_ONCE_CTRL(), such that we can
more conveniently use atomics in control dependencies.

Since we can assume atomic_read() implies a READ_ONCE(), we must only
emit an extra smp_read_barrier_depends() in order to upgrade to
READ_ONCE_CTRL() semantics.

Requested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/atomic.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 00a5763e850e..c4072421ea7f 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -4,6 +4,15 @@
 #include <asm/atomic.h>
 #include <asm/barrier.h>
 
+#ifndef atomic_read_ctrl
+static inline int atomic_read_ctrl(atomic_t *v)
+{
+	int val = atomic_read(v);
+	smp_read_barrier_depends(); /* Enforce control dependency. */
+	return val;
+}
+#endif
+
 /*
  * Relaxed variants of xchg, cmpxchg and some atomic operations.
  *

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 11:22                 ` Peter Zijlstra
@ 2015-09-18 11:30                   ` Will Deacon
  2015-09-18 11:50                   ` Dmitry Vyukov
  2015-09-18 16:15                   ` [PATCH] kernel: fix data race in put_pid Eric Dumazet
  2 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2015-09-18 11:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, Oleg Nesterov, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, Sep 18, 2015 at 12:22:52PM +0100, Peter Zijlstra wrote:
> On Fri, Sep 18, 2015 at 10:28:20AM +0100, Will Deacon wrote:
> > On Fri, Sep 18, 2015 at 10:06:46AM +0100, Dmitry Vyukov wrote:
> 
> > > Can we have something along the lines of:
> > > 
> > > #define atomic_read_ctrl(v) READ_ONCE_CTRL(&(v)->counter)
> > 
> > Funnily enough, I had this exact same discussion off-list yesterday
> > afternoon, since I wrote some code relying on a ctrl dependency from
> > an atomic_read to an atomic_xchg_relaxed.
> 
> Something a little like so should work fine I suppose.
> 
> ---
> Subject: atomic: Implement atomic_read_ctrl()
> 
> Provide atomic_read_ctrl() to mirror READ_ONCE_CTRL(), such that we can
> more conveniently use atomics in control dependencies.
> 
> Since we can assume atomic_read() implies a READ_ONCE(), we must only
> emit an extra smp_read_barrier_depends() in order to upgrade to
> READ_ONCE_CTRL() semantics.
> 
> Requested-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/atomic.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/linux/atomic.h b/include/linux/atomic.h
> index 00a5763e850e..c4072421ea7f 100644
> --- a/include/linux/atomic.h
> +++ b/include/linux/atomic.h
> @@ -4,6 +4,15 @@
>  #include <asm/atomic.h>
>  #include <asm/barrier.h>
>  
> +#ifndef atomic_read_ctrl
> +static inline int atomic_read_ctrl(atomic_t *v)
> +{
> +	int val = atomic_read(v);
> +	smp_read_barrier_depends(); /* Enforce control dependency. */
> +	return val;
> +}
> +#endif
> +
>  /*
>   * Relaxed variants of xchg, cmpxchg and some atomic operations.
>   *

Works for me:

  Acked-by: Will Deacon <will.deacon@arm.com>

Will

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 11:22                 ` Peter Zijlstra
  2015-09-18 11:30                   ` Will Deacon
@ 2015-09-18 11:50                   ` Dmitry Vyukov
  2015-09-18 11:56                     ` Peter Zijlstra
  2015-09-18 16:15                   ` [PATCH] kernel: fix data race in put_pid Eric Dumazet
  2 siblings, 1 reply; 40+ messages in thread
From: Dmitry Vyukov @ 2015-09-18 11:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Oleg Nesterov, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, Sep 18, 2015 at 1:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Sep 18, 2015 at 10:28:20AM +0100, Will Deacon wrote:
>> On Fri, Sep 18, 2015 at 10:06:46AM +0100, Dmitry Vyukov wrote:
>
>> > Can we have something along the lines of:
>> >
>> > #define atomic_read_ctrl(v) READ_ONCE_CTRL(&(v)->counter)
>>
>> Funnily enough, I had this exact same discussion off-list yesterday
>> afternoon, since I wrote some code relying on a ctrl dependency from
>> an atomic_read to an atomic_xchg_relaxed.
>
> Something a little like so should work fine I suppose.
>
> ---
> Subject: atomic: Implement atomic_read_ctrl()
>
> Provide atomic_read_ctrl() to mirror READ_ONCE_CTRL(), such that we can
> more conveniently use atomics in control dependencies.
>
> Since we can assume atomic_read() implies a READ_ONCE(), we must only
> emit an extra smp_read_barrier_depends() in order to upgrade to
> READ_ONCE_CTRL() semantics.
>
> Requested-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/atomic.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/include/linux/atomic.h b/include/linux/atomic.h
> index 00a5763e850e..c4072421ea7f 100644
> --- a/include/linux/atomic.h
> +++ b/include/linux/atomic.h
> @@ -4,6 +4,15 @@
>  #include <asm/atomic.h>
>  #include <asm/barrier.h>
>
> +#ifndef atomic_read_ctrl
> +static inline int atomic_read_ctrl(atomic_t *v)
> +{
> +       int val = atomic_read(v);
> +       smp_read_barrier_depends(); /* Enforce control dependency. */
> +       return val;
> +}
> +#endif
> +
>  /*
>   * Relaxed variants of xchg, cmpxchg and some atomic operations.
>   *

Looks good to me.
Should we add atomic64_read_ctrl for completeness? I have not seen
cases where it was needed, though.



-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 11:50                   ` Dmitry Vyukov
@ 2015-09-18 11:56                     ` Peter Zijlstra
  2015-09-18 12:19                       ` Peter Zijlstra
                                         ` (3 more replies)
  0 siblings, 4 replies; 40+ messages in thread
From: Peter Zijlstra @ 2015-09-18 11:56 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Will Deacon, Oleg Nesterov, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, Sep 18, 2015 at 01:50:01PM +0200, Dmitry Vyukov wrote:
> > +#ifndef atomic_read_ctrl
> > +static inline int atomic_read_ctrl(atomic_t *v)
> > +{
> > +       int val = atomic_read(v);
> > +       smp_read_barrier_depends(); /* Enforce control dependency. */
> > +       return val;
> > +}
> > +#endif
> > +
> >  /*
> >   * Relaxed variants of xchg, cmpxchg and some atomic operations.
> >   *
> 
> Looks good to me.
> Should we add atomic64_read_ctrl for completeness? I have not seen
> cases where it was needed, though.

Sure, and while doing another spin, let me go update the documentation
too.

---
Subject: atomic: Implement atomic_read_ctrl()
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri, 18 Sep 2015 13:22:52 +0200

Provide atomic_read_ctrl() to mirror READ_ONCE_CTRL(), such that we can
more conveniently use atomics in control dependencies.

Since we can assume atomic_read() implies a READ_ONCE(), we must only
emit an extra smp_read_barrier_depends() in order to upgrade to
READ_ONCE_CTRL() semantics.

Cc: oleg@redhat.com
Cc: torvalds@linux-foundation.org
Cc: will.deacon@arm.com
Cc: paulmck@linux.vnet.ibm.com
Requested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 Documentation/memory-barriers.txt |   17 +++++++++--------
 include/linux/atomic.h            |   18 ++++++++++++++++++
 2 files changed, 27 insertions(+), 8 deletions(-)

--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -637,7 +637,8 @@ to optimize the original example by elim
 	b = p;  /* BUG: Compiler and CPU can both reorder!!! */
 
 Finally, the READ_ONCE_CTRL() includes an smp_read_barrier_depends()
-that DEC Alpha needs in order to respect control depedencies.
+that DEC Alpha needs in order to respect control depedencies. Alternatively
+use one of atomic{,64}_read_ctrl().
 
 So don't leave out the READ_ONCE_CTRL().
 
@@ -796,9 +797,9 @@ site: https://www.cl.cam.ac.uk/~pes20/pp
 
 In summary:
 
-  (*) Control dependencies must be headed by READ_ONCE_CTRL().
-      Or, as a much less preferable alternative, interpose
-      smp_read_barrier_depends() between a READ_ONCE() and the
+  (*) Control dependencies must be headed by READ_ONCE_CTRL(),
+      atomic{,64}_read_ctrl(). Or, as a much less preferable alternative,
+      interpose smp_read_barrier_depends() between a READ_ONCE() and the
       control-dependent write.
 
   (*) Control dependencies can order prior loads against later stores.
@@ -820,10 +821,10 @@ site: https://www.cl.cam.ac.uk/~pes20/pp
       and WRITE_ONCE() can help to preserve the needed conditional.
 
   (*) Control dependencies require that the compiler avoid reordering the
-      dependency into nonexistence.  Careful use of READ_ONCE_CTRL()
-      or smp_read_barrier_depends() can help to preserve your control
-      dependency.  Please see the Compiler Barrier section for more
-      information.
+      dependency into nonexistence.  Careful use of READ_ONCE_CTRL(),
+      atomic{,64}_read_ctrl() or smp_read_barrier_depends() can help to
+      preserve your control dependency.  Please see the Compiler Barrier
+      section for more information.
 
   (*) Control dependencies pair normally with other types of barriers.
 
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -4,6 +4,24 @@
 #include <asm/atomic.h>
 #include <asm/barrier.h>
 
+#ifndef atomic_read_ctrl
+static inline int atomic_read_ctrl(atomic_t *v)
+{
+	int val = atomic_read(v);
+	smp_read_barrier_depends(); /* Enforce control dependency. */
+	return val;
+}
+#endif
+
+#ifndef atomic64_read_ctrl
+static inline int atomic64_read_ctrl(atomic64_t *v)
+{
+	int val = atomic64_read(v);
+	smp_read_barrier_depends(); /* Enforce control dependency. */
+	return val;
+}
+#endif
+
 /*
  * Relaxed variants of xchg, cmpxchg and some atomic operations.
  *

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 11:56                     ` Peter Zijlstra
@ 2015-09-18 12:19                       ` Peter Zijlstra
  2015-09-18 12:44                         ` Will Deacon
  2015-09-18 13:44                       ` Oleg Nesterov
                                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2015-09-18 12:19 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Will Deacon, Oleg Nesterov, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, Sep 18, 2015 at 01:56:37PM +0200, Peter Zijlstra wrote:
> +#ifndef atomic64_read_ctrl
> +static inline int atomic64_read_ctrl(atomic64_t *v)
> +{
> +	int val = atomic64_read(v);

Duh

	long long...

> +	smp_read_barrier_depends(); /* Enforce control dependency. */
> +	return val;
> +}
> +#endif

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18  9:27               ` Peter Zijlstra
@ 2015-09-18 12:31                 ` James Hogan
  2015-09-18 12:34                   ` Peter Zijlstra
  0 siblings, 1 reply; 40+ messages in thread
From: James Hogan @ 2015-09-18 12:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Dmitry Vyukov, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

[-- Attachment #1: Type: text/plain, Size: 1206 bytes --]

Hi Peter,

On Fri, Sep 18, 2015 at 11:27:32AM +0200, Peter Zijlstra wrote:
> On Fri, Sep 18, 2015 at 10:57:32AM +0200, Peter Zijlstra wrote:
> > diff --git a/arch/metag/include/asm/atomic_lnkget.h b/arch/metag/include/asm/atomic_lnkget.h
> > index 21c4c268b86c..1bd21c933435 100644
> > --- a/arch/metag/include/asm/atomic_lnkget.h
> > +++ b/arch/metag/include/asm/atomic_lnkget.h
> > @@ -3,7 +3,8 @@
> >  
> >  #define ATOMIC_INIT(i)	{ (i) }
> >  
> > -#define atomic_set(v, i)		((v)->counter = (i))
> > +/* XXX: should be LNKSETD ? */
> > +#define atomic_set(v, i)		WRITE_ONCE((v)->counter, (i))
> >  
> >  #include <linux/compiler.h>
> >  
> 
> James?

It should be fine without a LNKSET. The only reason LNKGET is required
for atomic_read() is that on certain cores the LNKGET/LNKSET
instructions went around the cache, but those cores also have write
through caches, so it shouldn't result in incoherence.

I've just double checked, and the pipeline seems to handle the linked
read after write correctly, so e.g. after:

0220002c    MOV       D0FrT,#0x5
b6208042    SETD      [D1Ar3],D0FrT
ad1080cc    LNKGETD   D0Ar4,[D1Ar3]

D0Ar4 is correct (0x5).

Cheers
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 12:31                 ` James Hogan
@ 2015-09-18 12:34                   ` Peter Zijlstra
  0 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2015-09-18 12:34 UTC (permalink / raw)
  To: James Hogan
  Cc: Oleg Nesterov, Dmitry Vyukov, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, Sep 18, 2015 at 01:31:27PM +0100, James Hogan wrote:
> Hi Peter,
> 
> On Fri, Sep 18, 2015 at 11:27:32AM +0200, Peter Zijlstra wrote:
> > On Fri, Sep 18, 2015 at 10:57:32AM +0200, Peter Zijlstra wrote:
> > > diff --git a/arch/metag/include/asm/atomic_lnkget.h b/arch/metag/include/asm/atomic_lnkget.h
> > > index 21c4c268b86c..1bd21c933435 100644
> > > --- a/arch/metag/include/asm/atomic_lnkget.h
> > > +++ b/arch/metag/include/asm/atomic_lnkget.h
> > > @@ -3,7 +3,8 @@
> > >  
> > >  #define ATOMIC_INIT(i)	{ (i) }
> > >  
> > > -#define atomic_set(v, i)		((v)->counter = (i))
> > > +/* XXX: should be LNKSETD ? */
> > > +#define atomic_set(v, i)		WRITE_ONCE((v)->counter, (i))
> > >  
> > >  #include <linux/compiler.h>
> > >  
> > 
> > James?
> 
> It should be fine without a LNKSET. The only reason LNKGET is required
> for atomic_read() is that on certain cores the LNKGET/LNKSET
> instructions went around the cache, but those cores also have write
> through caches, so it shouldn't result in incoherence.
> 
> I've just double checked, and the pipeline seems to handle the linked
> read after write correctly, so e.g. after:
> 
> 0220002c    MOV       D0FrT,#0x5
> b6208042    SETD      [D1Ar3],D0FrT
> ad1080cc    LNKGETD   D0Ar4,[D1Ar3]
> 
> D0Ar4 is correct (0x5).

Thanks! XXX removed.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 12:19                       ` Peter Zijlstra
@ 2015-09-18 12:44                         ` Will Deacon
  2015-09-18 13:10                           ` Peter Zijlstra
  0 siblings, 1 reply; 40+ messages in thread
From: Will Deacon @ 2015-09-18 12:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, Oleg Nesterov, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, Sep 18, 2015 at 01:19:20PM +0100, Peter Zijlstra wrote:
> On Fri, Sep 18, 2015 at 01:56:37PM +0200, Peter Zijlstra wrote:
> > +#ifndef atomic64_read_ctrl
> > +static inline int atomic64_read_ctrl(atomic64_t *v)
> > +{
> > +	int val = atomic64_read(v);
> 
> Duh
> 
> 	long long...

...and atomic_long_read_ctrl, too?

Will

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 12:44                         ` Will Deacon
@ 2015-09-18 13:10                           ` Peter Zijlstra
  0 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2015-09-18 13:10 UTC (permalink / raw)
  To: Will Deacon
  Cc: Dmitry Vyukov, Oleg Nesterov, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, Sep 18, 2015 at 01:44:42PM +0100, Will Deacon wrote:
> On Fri, Sep 18, 2015 at 01:19:20PM +0100, Peter Zijlstra wrote:
> > On Fri, Sep 18, 2015 at 01:56:37PM +0200, Peter Zijlstra wrote:
> > > +#ifndef atomic64_read_ctrl
> > > +static inline int atomic64_read_ctrl(atomic64_t *v)
> > > +{
> > > +	int val = atomic64_read(v);
> > 
> > Duh
> > 
> > 	long long...
> 
> ...and atomic_long_read_ctrl, too?

+ATOMIC_LONG_READ_OP(_ctrl)

done.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18  8:51           ` Peter Zijlstra
  2015-09-18  8:57             ` Peter Zijlstra
  2015-09-18  9:06             ` Dmitry Vyukov
@ 2015-09-18 13:28             ` Oleg Nesterov
  2015-09-18 13:31               ` Oleg Nesterov
  2015-09-18 13:46               ` Peter Zijlstra
  2 siblings, 2 replies; 40+ messages in thread
From: Oleg Nesterov @ 2015-09-18 13:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, ebiederm, Al Viro, Andrew Morton, Ingo Molnar,
	Paul McKenney, mhocko, LKML, ktsan, Kostya Serebryany,
	Andrey Konovalov, Alexander Potapenko, Hans Boehm

On 09/18, Peter Zijlstra wrote:
>
> On Thu, Sep 17, 2015 at 08:09:19PM +0200, Oleg Nesterov wrote:
>
> > I need to recheck, but afaics this is not possible. This optimization
> > is fine, but probably needs a comment.
>
> For sure, this code doesn't make any sense to me.

So yes, after a sleep I am starting to agree that in theory this fast-path
check is wrong. I'll write another email..

> As an alternative patch, could we not do:
>
>   void put_pid(struct pid *pid)
>   {
> 	struct pid_namespace *ns;
>
> 	if (!pid)
> 		return;
>
> 	ns = pid->numbers[pid->level].ns;
> 	if ((atomic_read(&pid->count) == 1) ||
> 	     atomic_dec_and_test(&pid->count)) {
>
> +		smp_read_barrier_depends(); /* ctrl-dep */

Not sure... Firstly it is not clear what this barrier pairs with. And I
have to admit that I can not understand if _CTRL() logic applies here.
The same for atomic_read_ctrl().

OK, please forget about put_pid() for the moment. Suppose we have

	X = 1;
	synchronize_sched();
	Y = 1;

Or
	X = 1;
	call_rcu_sched( func => { Y = 1; } );



Now. In theory this this code is wrong:

	if (Y) {
		BUG_ON(X == 0);
	}

But this is correct:

	if (Y) {
		rcu_read_lock_sched();
		rcu_read_unlock_sched();
		BUG_ON(X == 0);
	}

So perhaps something like this

	/*
	 * Comment to explain it is eq to read_lock + read_unlock,
	 * in a sense that this guarantees a full barrier wrt to
	 * the previous synchronize_sched().
	 */
	#define rcu_read_barrier_sched()	barrier()

make sense?


And again, I simply can't understand if this code

	if (READ_ONCE_CTRL(Y))
		BUG_ON(X == 0);

to me it does _not_ look correct in theory.
		
Oleg.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 13:28             ` Oleg Nesterov
@ 2015-09-18 13:31               ` Oleg Nesterov
  2015-09-18 13:46               ` Peter Zijlstra
  1 sibling, 0 replies; 40+ messages in thread
From: Oleg Nesterov @ 2015-09-18 13:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, ebiederm, Al Viro, Andrew Morton, Ingo Molnar,
	Paul McKenney, mhocko, LKML, ktsan, Kostya Serebryany,
	Andrey Konovalov, Alexander Potapenko, Hans Boehm

Damn, sorry for noise,

On 09/18, Oleg Nesterov wrote:
>
> Now. In theory this this code is wrong:
>
> 	if (Y) {
> 		BUG_ON(X == 0);
> 	}

Of course without READ_ONCE() or barrier() in between this code
is buggy in any case. But I hope you understand what I tried to
say...

Oleg.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 11:56                     ` Peter Zijlstra
  2015-09-18 12:19                       ` Peter Zijlstra
@ 2015-09-18 13:44                       ` Oleg Nesterov
  2015-09-18 13:49                         ` Peter Zijlstra
  2015-09-18 13:53                         ` Dmitry Vyukov
  2015-09-22  8:38                       ` Dmitry Vyukov
  2015-09-23  8:48                       ` [tip:locking/core] atomic: Implement atomic_read_ctrl() tip-bot for Peter Zijlstra
  3 siblings, 2 replies; 40+ messages in thread
From: Oleg Nesterov @ 2015-09-18 13:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, Will Deacon, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On 09/18, Peter Zijlstra wrote:
>
> Provide atomic_read_ctrl() to mirror READ_ONCE_CTRL(), such that we can
> more conveniently use atomics in control dependencies.
>
> Since we can assume atomic_read() implies a READ_ONCE(), we must only
> emit an extra smp_read_barrier_depends() in order to upgrade to
> READ_ONCE_CTRL() semantics.

...

> +static inline int atomic_read_ctrl(atomic_t *v)
> +{
> +	int val = atomic_read(v);
> +	smp_read_barrier_depends(); /* Enforce control dependency. */
> +	return val;
> +}

Help. I am starting to think that the control dependencies is even more
hard to understand that memory barriers...

So I assume that if we have

	int X = 0;
	atomic_t Y = ATOMIC_INIT(0);

	void w(void)
	{
		X = 1;
		atomic_inc_return(&Y);
	}

then

	void r(void)
	{
		if (atomic_read_ctrl(&Y))
			BUG_ON(X == 0);
	}

should be correct?  Why?

If not then I am even more confused.

Oleg.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 13:28             ` Oleg Nesterov
  2015-09-18 13:31               ` Oleg Nesterov
@ 2015-09-18 13:46               ` Peter Zijlstra
  2015-09-18 15:00                 ` Oleg Nesterov
                                   ` (2 more replies)
  1 sibling, 3 replies; 40+ messages in thread
From: Peter Zijlstra @ 2015-09-18 13:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, ebiederm, Al Viro, Andrew Morton, Ingo Molnar,
	Paul McKenney, mhocko, LKML, ktsan, Kostya Serebryany,
	Andrey Konovalov, Alexander Potapenko, Hans Boehm

On Fri, Sep 18, 2015 at 03:28:44PM +0200, Oleg Nesterov wrote:
> On 09/18, Peter Zijlstra wrote:
> >
> > On Thu, Sep 17, 2015 at 08:09:19PM +0200, Oleg Nesterov wrote:
> >
> > > I need to recheck, but afaics this is not possible. This optimization
> > > is fine, but probably needs a comment.
> >
> > For sure, this code doesn't make any sense to me.
> 
> So yes, after a sleep I am starting to agree that in theory this fast-path
> check is wrong. I'll write another email..

This other mail will include a patch adding comments to pid.c ? That
code didn't want to make sense to me this morning.

> > As an alternative patch, could we not do:
> >
> >   void put_pid(struct pid *pid)
> >   {
> > 	struct pid_namespace *ns;
> >
> > 	if (!pid)
> > 		return;
> >
> > 	ns = pid->numbers[pid->level].ns;
> > 	if ((atomic_read(&pid->count) == 1) ||
> > 	     atomic_dec_and_test(&pid->count)) {
> >
> > +		smp_read_barrier_depends(); /* ctrl-dep */
> 
> Not sure... Firstly it is not clear what this barrier pairs with. And I
> have to admit that I can not understand if _CTRL() logic applies here.
> The same for atomic_read_ctrl().

The control dependency barrier pairs with the full barrier of
atomic_dec_and_test.

So the two put_pid() instances:

	CPU0					CPU1

	pid->foo = 1;
	atomic_dec_and_test() == false		atomic_read_ctrl() == 1
						kmem_cache_free(pid)

CPU0 will modify a pid field and decrement, but not reach 0.
CPU1 finds we're the last, but must also be able to observe our foo
store such that we can rest assured  it is complete before we free the
storage.

The freeing of pid, on CPU1, is stores, these must not happen before we
satisfy the freeing condition, iow a load-store barrier, which is what
the control dependency provides.

> OK, please forget about put_pid() for the moment. Suppose we have
> 
> 	X = 1;
> 	synchronize_sched();
> 	Y = 1;
> 
> Or
> 	X = 1;
> 	call_rcu_sched( func => { Y = 1; } );
> 
> 
> 
> Now. In theory this this code is wrong:
> 
> 	if (Y) {
> 		BUG_ON(X == 0);
> 	}
> 
> But this is correct:
> 
> 	if (Y) {
> 		rcu_read_lock_sched();
> 		rcu_read_unlock_sched();
> 		BUG_ON(X == 0);
> 	}
> 
> So perhaps something like this
> 
> 	/*
> 	 * Comment to explain it is eq to read_lock + read_unlock,
> 	 * in a sense that this guarantees a full barrier wrt to
> 	 * the previous synchronize_sched().
> 	 */
> 	#define rcu_read_barrier_sched()	barrier()
> 
> make sense?
> 
> 
> And again, I simply can't understand if this code
> 
> 	if (READ_ONCE_CTRL(Y))
> 		BUG_ON(X == 0);
> 
> to me it does _not_ look correct in theory.

So control dependencies provide a load-store barrier. Your examples
above rely on a load-load barrier; BUG_ON(X == 0) is a load.

kmem_cache_free() OTOH is stores (we must modify the free list).

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 13:44                       ` Oleg Nesterov
@ 2015-09-18 13:49                         ` Peter Zijlstra
  2015-09-18 13:53                         ` Dmitry Vyukov
  1 sibling, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2015-09-18 13:49 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Vyukov, Will Deacon, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, Sep 18, 2015 at 03:44:53PM +0200, Oleg Nesterov wrote:
> On 09/18, Peter Zijlstra wrote:
> > +static inline int atomic_read_ctrl(atomic_t *v)
> > +{
> > +	int val = atomic_read(v);
> > +	smp_read_barrier_depends(); /* Enforce control dependency. */
> > +	return val;
> > +}
> 
> Help. I am starting to think that the control dependencies is even more
> hard to understand that memory barriers...

Hehe, think of then as a load-store barrier; due to the 'impossibility'
of speculative stores (we'd see all kinds of random crap if you could
speculate stores).

> So I assume that if we have
> 
> 	int X = 0;
> 	atomic_t Y = ATOMIC_INIT(0);
> 
> 	void w(void)
> 	{
> 		X = 1;
> 		atomic_inc_return(&Y);
> 	}
> 
> then
> 
> 	void r(void)
> 	{
> 		if (atomic_read_ctrl(&Y))
> 			BUG_ON(X == 0);
> 	}
> 
> should be correct?  Why?

Nope, because its (again) a load-load order you have there.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 13:44                       ` Oleg Nesterov
  2015-09-18 13:49                         ` Peter Zijlstra
@ 2015-09-18 13:53                         ` Dmitry Vyukov
  2015-09-18 14:41                           ` Oleg Nesterov
  1 sibling, 1 reply; 40+ messages in thread
From: Dmitry Vyukov @ 2015-09-18 13:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Will Deacon, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, Sep 18, 2015 at 3:44 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 09/18, Peter Zijlstra wrote:
>>
>> Provide atomic_read_ctrl() to mirror READ_ONCE_CTRL(), such that we can
>> more conveniently use atomics in control dependencies.
>>
>> Since we can assume atomic_read() implies a READ_ONCE(), we must only
>> emit an extra smp_read_barrier_depends() in order to upgrade to
>> READ_ONCE_CTRL() semantics.
>
> ...
>
>> +static inline int atomic_read_ctrl(atomic_t *v)
>> +{
>> +     int val = atomic_read(v);
>> +     smp_read_barrier_depends(); /* Enforce control dependency. */
>> +     return val;
>> +}
>
> Help. I am starting to think that the control dependencies is even more
> hard to understand that memory barriers...
>
> So I assume that if we have
>
>         int X = 0;
>         atomic_t Y = ATOMIC_INIT(0);
>
>         void w(void)
>         {
>                 X = 1;
>                 atomic_inc_return(&Y);
>         }
>
> then
>
>         void r(void)
>         {
>                 if (atomic_read_ctrl(&Y))
>                         BUG_ON(X == 0);
>         }
>
> should be correct?  Why?
>
> If not then I am even more confused.

This not correct, because "ctrl" barrier affects only
control-dependent stores. For reads processor still can speculate,
that is, speculatively load X before loading Y. Control-dependent
require full read/acquire memory barrier.
What will work is:

// thread 1
                 X = 1;
                 atomic_inc_return(&Y);

// thread 2
                 if (atomic_read_ctrl(&Y)) {
                         X = 2;
                         BUG_ON(X == 2);
                  }

Without the "ctrl" barrier store X=2 could hoist above load of Y (on
Alpha), and then X=1 can happen _after_ X=2, and then BUG_ON could
fail.
With the "ctrl" barrier store X=2 is not allowed to hoist above load of Y.


-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 13:53                         ` Dmitry Vyukov
@ 2015-09-18 14:41                           ` Oleg Nesterov
  0 siblings, 0 replies; 40+ messages in thread
From: Oleg Nesterov @ 2015-09-18 14:41 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Peter Zijlstra, Will Deacon, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On 09/18, Dmitry Vyukov wrote:
>
> On Fri, Sep 18, 2015 at 3:44 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> > So I assume that if we have
> >
> >         int X = 0;
> >         atomic_t Y = ATOMIC_INIT(0);
> >
> >         void w(void)
> >         {
> >                 X = 1;
> >                 atomic_inc_return(&Y);
> >         }
> >
> > then
> >
> >         void r(void)
> >         {
> >                 if (atomic_read_ctrl(&Y))
> >                         BUG_ON(X == 0);
> >         }
> >
> > should be correct?  Why?
> >
> > If not then I am even more confused.
>
> This not correct,

Good. because I wasn't able to understand why this could work.

> // thread 1
>                  X = 1;
>                  atomic_inc_return(&Y);
>
> // thread 2
>                  if (atomic_read_ctrl(&Y)) {
>                          X = 2;
>                          BUG_ON(X == 2);
>                   }

Thanks. This makes perfect sense to me.

And then I agree, atomic_read_ctrl() in put_pid() should fix the
theoretical problem.

Perhaps we can add this example to memory-barriers.txt... Although
perhaps it already explains/documents this case. I am afraid to open
it, it is huge and changes too often so every time it looks like a
new document to me ;)

Oleg.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 13:46               ` Peter Zijlstra
@ 2015-09-18 15:00                 ` Oleg Nesterov
  2015-09-18 15:30                 ` Oleg Nesterov
  2015-09-18 16:00                 ` Paul E. McKenney
  2 siblings, 0 replies; 40+ messages in thread
From: Oleg Nesterov @ 2015-09-18 15:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, ebiederm, Al Viro, Andrew Morton, Ingo Molnar,
	Paul McKenney, mhocko, LKML, ktsan, Kostya Serebryany,
	Andrey Konovalov, Alexander Potapenko, Hans Boehm

On 09/18, Peter Zijlstra wrote:
>
> On Fri, Sep 18, 2015 at 03:28:44PM +0200, Oleg Nesterov wrote:
> > On 09/18, Peter Zijlstra wrote:
> > >
> > > 	ns = pid->numbers[pid->level].ns;
> > > 	if ((atomic_read(&pid->count) == 1) ||
> > > 	     atomic_dec_and_test(&pid->count)) {
> > >
> > > +		smp_read_barrier_depends(); /* ctrl-dep */
> >
> > Not sure... Firstly it is not clear what this barrier pairs with. And I
> > have to admit that I can not understand if _CTRL() logic applies here.
> > The same for atomic_read_ctrl().
>
> The control dependency barrier pairs with the full barrier of
> atomic_dec_and_test.

Yes thanks. I already got it. I hope ;)

> > OK, please forget about put_pid() for the moment. Suppose we have
> >
> > 	X = 1;
> > 	synchronize_sched();
> > 	Y = 1;
> >
> > Or
> > 	X = 1;
> > 	call_rcu_sched( func => { Y = 1; } );
> >
> >
> >
> > Now. In theory this this code is wrong:
> >
> > 	if (Y) {
> > 		BUG_ON(X == 0);
> > 	}
> >
> > But this is correct:
> >
> > 	if (Y) {
> > 		rcu_read_lock_sched();
> > 		rcu_read_unlock_sched();
> > 		BUG_ON(X == 0);
> > 	}
> >
> > So perhaps something like this
> >
> > 	/*
> > 	 * Comment to explain it is eq to read_lock + read_unlock,
> > 	 * in a sense that this guarantees a full barrier wrt to
> > 	 * the previous synchronize_sched().
> > 	 */
> > 	#define rcu_read_barrier_sched()	barrier()
> >
> > make sense?
> >
> >
> > And again, I simply can't understand if this code
> >
> > 	if (READ_ONCE_CTRL(Y))
> > 		BUG_ON(X == 0);
> >
> > to me it does _not_ look correct in theory.
>
> So control dependencies provide a load-store barrier. Your examples
> above rely on a load-load barrier; BUG_ON(X == 0) is a load.

Yes, yes...

What I tried to say is that we could fix it another way. And even look
at this problem from another angle. No, it is not that I think it would
be better in this particular case, but still...

put_pid() could do

	if (atomic_read(&pid->count) == 1) {

		rcu_read_lock();
		rcu_read_unlock();
		
		kmem_cache_free(pid);
	}

if we observe atomic_read() == 1, we know that we have at least one
gp pass after all other writes to this memory (namely hlist_del_rcu()
which removes it from rcu-list). Because we can see atomic_read() == 1
until delayed_put_pid() (called by RCU) drops its reference.

and perhaps this lock + unlock pair (which is nop at least for _sched)
makes some sense in general...

Oleg.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 13:46               ` Peter Zijlstra
  2015-09-18 15:00                 ` Oleg Nesterov
@ 2015-09-18 15:30                 ` Oleg Nesterov
  2015-09-18 16:00                 ` Paul E. McKenney
  2 siblings, 0 replies; 40+ messages in thread
From: Oleg Nesterov @ 2015-09-18 15:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Vyukov, ebiederm, Al Viro, Andrew Morton, Ingo Molnar,
	Paul McKenney, mhocko, LKML, ktsan, Kostya Serebryany,
	Andrey Konovalov, Alexander Potapenko, Hans Boehm

I can't resist. You certainly don't need this spam, but

On 09/18, Peter Zijlstra wrote:
>
> On Fri, Sep 18, 2015 at 03:28:44PM +0200, Oleg Nesterov wrote:
> >
> > And again, I simply can't understand if this code
> >
> > 	if (READ_ONCE_CTRL(Y))
> > 		BUG_ON(X == 0);
> >
> > to me it does _not_ look correct in theory.
>
> So control dependencies provide a load-store barrier. Your examples
> above rely on a load-load barrier; BUG_ON(X == 0) is a load.

Yes, and this is just obvious.

Yet I was confused because reading other emails I misunderstood the
proposed semantics of atomic_read_ctrl(), so I started to suspect
that in fact _CTRL() does more than I used to think.

Nevermind, sorry for noise, thanks to all.

Oleg.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18  8:57             ` Peter Zijlstra
  2015-09-18  9:27               ` Peter Zijlstra
@ 2015-09-18 15:56               ` Paul E. McKenney
  1 sibling, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2015-09-18 15:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Dmitry Vyukov, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, mhocko, LKML, ktsan, Kostya Serebryany,
	Andrey Konovalov, Alexander Potapenko, Hans Boehm

On Fri, Sep 18, 2015 at 10:57:32AM +0200, Peter Zijlstra wrote:
> On Fri, Sep 18, 2015 at 10:51:56AM +0200, Peter Zijlstra wrote:
> > That would upgrade the atomic_read() path to a full READ_ONCE_CTRL(),
> > and thereby avoid any of the kmem_cache_free() stores from leaking out.
> 
> That said, on my TODO list is an item to review all atomic_{read,set}()
> implementation to ensure they're (at least) {READ,WRITE}_ONCE().
> 
> A quick scan left me with this. There were indeed a few archs lacking
> READ_ONCE (or ACCESS_ONCE) for atomic_read() and hardly anybody used
> WRITE_ONCE() (or ACCESS_ONCE casting) for atomic_set().

You beat me to it!  ;-)

When you get a signed-off version, please feel free to add my Acked-by.

							Thanx, Paul

> ---
>  arch/alpha/include/asm/atomic.h        | 8 ++++----
>  arch/arc/include/asm/atomic.h          | 8 ++++----
>  arch/arm/include/asm/atomic.h          | 4 ++--
>  arch/arm64/include/asm/atomic.h        | 2 +-
>  arch/avr32/include/asm/atomic.h        | 4 ++--
>  arch/frv/include/asm/atomic.h          | 4 ++--
>  arch/h8300/include/asm/atomic.h        | 4 ++--
>  arch/hexagon/include/asm/atomic.h      | 2 +-
>  arch/ia64/include/asm/atomic.h         | 8 ++++----
>  arch/m32r/include/asm/atomic.h         | 4 ++--
>  arch/m68k/include/asm/atomic.h         | 4 ++--
>  arch/metag/include/asm/atomic_lnkget.h | 3 ++-
>  arch/metag/include/asm/atomic_lock1.h  | 2 +-
>  arch/mips/include/asm/atomic.h         | 8 ++++----
>  arch/mn10300/include/asm/atomic.h      | 4 ++--
>  arch/parisc/include/asm/atomic.h       | 2 +-
>  arch/sh/include/asm/atomic.h           | 4 ++--
>  arch/sparc/include/asm/atomic_64.h     | 8 ++++----
>  arch/tile/include/asm/atomic.h         | 2 +-
>  arch/tile/include/asm/atomic_64.h      | 6 +++---
>  arch/x86/include/asm/atomic.h          | 4 ++--
>  arch/x86/include/asm/atomic64_64.h     | 4 ++--
>  arch/xtensa/include/asm/atomic.h       | 4 ++--
>  include/asm-generic/atomic.h           | 4 ++--
>  24 files changed, 54 insertions(+), 53 deletions(-)
> 
> diff --git a/arch/alpha/include/asm/atomic.h b/arch/alpha/include/asm/atomic.h
> index e8c956098424..572b228c44c7 100644
> --- a/arch/alpha/include/asm/atomic.h
> +++ b/arch/alpha/include/asm/atomic.h
> @@ -17,11 +17,11 @@
>  #define ATOMIC_INIT(i)		{ (i) }
>  #define ATOMIC64_INIT(i)	{ (i) }
> 
> -#define atomic_read(v)		ACCESS_ONCE((v)->counter)
> -#define atomic64_read(v)	ACCESS_ONCE((v)->counter)
> +#define atomic_read(v)		READ_ONCE((v)->counter)
> +#define atomic64_read(v)	READ_ONCE((v)->counter)
> 
> -#define atomic_set(v,i)		((v)->counter = (i))
> -#define atomic64_set(v,i)	((v)->counter = (i))
> +#define atomic_set(v,i)		WRITE_ONCE((v)->counter, (i))
> +#define atomic64_set(v,i)	WRITE_ONCE((v)->counter, (i))
> 
>  /*
>   * To get proper branch prediction for the main line, we must branch
> diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
> index c3ecda023e3a..7730d302cadb 100644
> --- a/arch/arc/include/asm/atomic.h
> +++ b/arch/arc/include/asm/atomic.h
> @@ -17,11 +17,11 @@
>  #include <asm/barrier.h>
>  #include <asm/smp.h>
> 
> -#define atomic_read(v)  ((v)->counter)
> +#define atomic_read(v)  READ_ONCE((v)->counter)
> 
>  #ifdef CONFIG_ARC_HAS_LLSC
> 
> -#define atomic_set(v, i) (((v)->counter) = (i))
> +#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
> 
>  #ifdef CONFIG_ARC_STAR_9000923308
> 
> @@ -107,7 +107,7 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
>  #ifndef CONFIG_SMP
> 
>   /* violating atomic_xxx API locking protocol in UP for optimization sake */
> -#define atomic_set(v, i) (((v)->counter) = (i))
> +#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
> 
>  #else
> 
> @@ -125,7 +125,7 @@ static inline void atomic_set(atomic_t *v, int i)
>  	unsigned long flags;
> 
>  	atomic_ops_lock(flags);
> -	v->counter = i;
> +	WRITE_ONCE(v->counter, i);
>  	atomic_ops_unlock(flags);
>  }
> 
> diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> index fe3ef397f5a4..2bf80afb7841 100644
> --- a/arch/arm/include/asm/atomic.h
> +++ b/arch/arm/include/asm/atomic.h
> @@ -27,8 +27,8 @@
>   * strex/ldrex monitor on some implementations. The reason we can use it for
>   * atomic_set() is the clrex or dummy strex done on every exception return.
>   */
> -#define atomic_read(v)	ACCESS_ONCE((v)->counter)
> -#define atomic_set(v,i)	(((v)->counter) = (i))
> +#define atomic_read(v)	READ_ONCE((v)->counter)
> +#define atomic_set(v,i)	WRITE_ONCE(((v)->counter), (i))
> 
>  #if __LINUX_ARM_ARCH__ >= 6
> 
> diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
> index 35a67783cfa0..1e247ac2601a 100644
> --- a/arch/arm64/include/asm/atomic.h
> +++ b/arch/arm64/include/asm/atomic.h
> @@ -54,7 +54,7 @@
>  #define ATOMIC_INIT(i)	{ (i) }
> 
>  #define atomic_read(v)			READ_ONCE((v)->counter)
> -#define atomic_set(v, i)		(((v)->counter) = (i))
> +#define atomic_set(v, i)		WRITE_ONCE(((v)->counter), (i))
>  #define atomic_xchg(v, new)		xchg(&((v)->counter), (new))
>  #define atomic_cmpxchg(v, old, new)	cmpxchg(&((v)->counter), (old), (new))
> 
> diff --git a/arch/avr32/include/asm/atomic.h b/arch/avr32/include/asm/atomic.h
> index 97c9bdf83409..d74fd8ce980a 100644
> --- a/arch/avr32/include/asm/atomic.h
> +++ b/arch/avr32/include/asm/atomic.h
> @@ -19,8 +19,8 @@
> 
>  #define ATOMIC_INIT(i)  { (i) }
> 
> -#define atomic_read(v)		ACCESS_ONCE((v)->counter)
> -#define atomic_set(v, i)	(((v)->counter) = i)
> +#define atomic_read(v)		READ_ONCE((v)->counter)
> +#define atomic_set(v, i)	WRITE_ONCE(((v)->counter), (i))
> 
>  #define ATOMIC_OP_RETURN(op, asm_op, asm_con)				\
>  static inline int __atomic_##op##_return(int i, atomic_t *v)		\
> diff --git a/arch/frv/include/asm/atomic.h b/arch/frv/include/asm/atomic.h
> index 0da689def4cc..64f02d451aa8 100644
> --- a/arch/frv/include/asm/atomic.h
> +++ b/arch/frv/include/asm/atomic.h
> @@ -32,8 +32,8 @@
>   */
> 
>  #define ATOMIC_INIT(i)		{ (i) }
> -#define atomic_read(v)		ACCESS_ONCE((v)->counter)
> -#define atomic_set(v, i)	(((v)->counter) = (i))
> +#define atomic_read(v)		READ_ONCE((v)->counter)
> +#define atomic_set(v, i)	WRITE_ONCE(((v)->counter), (i))
> 
>  static inline int atomic_inc_return(atomic_t *v)
>  {
> diff --git a/arch/h8300/include/asm/atomic.h b/arch/h8300/include/asm/atomic.h
> index 702ee539f87d..4435a445ae7e 100644
> --- a/arch/h8300/include/asm/atomic.h
> +++ b/arch/h8300/include/asm/atomic.h
> @@ -11,8 +11,8 @@
> 
>  #define ATOMIC_INIT(i)	{ (i) }
> 
> -#define atomic_read(v)		ACCESS_ONCE((v)->counter)
> -#define atomic_set(v, i)	(((v)->counter) = i)
> +#define atomic_read(v)		READ_ONCE((v)->counter)
> +#define atomic_set(v, i)	WRITE_ONCE(((v)->counter), (i))
> 
>  #include <linux/kernel.h>
> 
> diff --git a/arch/hexagon/include/asm/atomic.h b/arch/hexagon/include/asm/atomic.h
> index 811d61f6422d..55696c4100d4 100644
> --- a/arch/hexagon/include/asm/atomic.h
> +++ b/arch/hexagon/include/asm/atomic.h
> @@ -48,7 +48,7 @@ static inline void atomic_set(atomic_t *v, int new)
>   *
>   * Assumes all word reads on our architecture are atomic.
>   */
> -#define atomic_read(v)		((v)->counter)
> +#define atomic_read(v)		READ_ONCE((v)->counter)
> 
>  /**
>   * atomic_xchg - atomic
> diff --git a/arch/ia64/include/asm/atomic.h b/arch/ia64/include/asm/atomic.h
> index be4beeb77d57..8dfb5f6f6c35 100644
> --- a/arch/ia64/include/asm/atomic.h
> +++ b/arch/ia64/include/asm/atomic.h
> @@ -21,11 +21,11 @@
>  #define ATOMIC_INIT(i)		{ (i) }
>  #define ATOMIC64_INIT(i)	{ (i) }
> 
> -#define atomic_read(v)		ACCESS_ONCE((v)->counter)
> -#define atomic64_read(v)	ACCESS_ONCE((v)->counter)
> +#define atomic_read(v)		READ_ONCE((v)->counter)
> +#define atomic64_read(v)	READ_ONCE((v)->counter)
> 
> -#define atomic_set(v,i)		(((v)->counter) = (i))
> -#define atomic64_set(v,i)	(((v)->counter) = (i))
> +#define atomic_set(v,i)		WRITE_ONCE(((v)->counter), (i))
> +#define atomic64_set(v,i)	WRITE_ONCE(((v)->counter), (i))
> 
>  #define ATOMIC_OP(op, c_op)						\
>  static __inline__ int							\
> diff --git a/arch/m32r/include/asm/atomic.h b/arch/m32r/include/asm/atomic.h
> index 025e2a170493..ea35160d632b 100644
> --- a/arch/m32r/include/asm/atomic.h
> +++ b/arch/m32r/include/asm/atomic.h
> @@ -28,7 +28,7 @@
>   *
>   * Atomically reads the value of @v.
>   */
> -#define atomic_read(v)	ACCESS_ONCE((v)->counter)
> +#define atomic_read(v)	READ_ONCE((v)->counter)
> 
>  /**
>   * atomic_set - set atomic variable
> @@ -37,7 +37,7 @@
>   *
>   * Atomically sets the value of @v to @i.
>   */
> -#define atomic_set(v,i)	(((v)->counter) = (i))
> +#define atomic_set(v,i)	WRITE_ONCE(((v)->counter), (i))
> 
>  #ifdef CONFIG_CHIP_M32700_TS1
>  #define __ATOMIC_CLOBBER	, "r4"
> diff --git a/arch/m68k/include/asm/atomic.h b/arch/m68k/include/asm/atomic.h
> index 039fac120cc0..4858178260f9 100644
> --- a/arch/m68k/include/asm/atomic.h
> +++ b/arch/m68k/include/asm/atomic.h
> @@ -17,8 +17,8 @@
> 
>  #define ATOMIC_INIT(i)	{ (i) }
> 
> -#define atomic_read(v)		ACCESS_ONCE((v)->counter)
> -#define atomic_set(v, i)	(((v)->counter) = i)
> +#define atomic_read(v)		READ_ONCE((v)->counter)
> +#define atomic_set(v, i)	WRITE_ONCE(((v)->counter), (i))
> 
>  /*
>   * The ColdFire parts cannot do some immediate to memory operations,
> diff --git a/arch/metag/include/asm/atomic_lnkget.h b/arch/metag/include/asm/atomic_lnkget.h
> index 21c4c268b86c..1bd21c933435 100644
> --- a/arch/metag/include/asm/atomic_lnkget.h
> +++ b/arch/metag/include/asm/atomic_lnkget.h
> @@ -3,7 +3,8 @@
> 
>  #define ATOMIC_INIT(i)	{ (i) }
> 
> -#define atomic_set(v, i)		((v)->counter = (i))
> +/* XXX: should be LNKSETD ? */
> +#define atomic_set(v, i)		WRITE_ONCE((v)->counter, (i))
> 
>  #include <linux/compiler.h>
> 
> diff --git a/arch/metag/include/asm/atomic_lock1.h b/arch/metag/include/asm/atomic_lock1.h
> index f8efe380fe8b..0295d9b8d5bf 100644
> --- a/arch/metag/include/asm/atomic_lock1.h
> +++ b/arch/metag/include/asm/atomic_lock1.h
> @@ -10,7 +10,7 @@
> 
>  static inline int atomic_read(const atomic_t *v)
>  {
> -	return (v)->counter;
> +	return READ_ONCE((v)->counter);
>  }
> 
>  /*
> diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
> index 4c42fd9af777..f82d3af07931 100644
> --- a/arch/mips/include/asm/atomic.h
> +++ b/arch/mips/include/asm/atomic.h
> @@ -30,7 +30,7 @@
>   *
>   * Atomically reads the value of @v.
>   */
> -#define atomic_read(v)		ACCESS_ONCE((v)->counter)
> +#define atomic_read(v)		READ_ONCE((v)->counter)
> 
>  /*
>   * atomic_set - set atomic variable
> @@ -39,7 +39,7 @@
>   *
>   * Atomically sets the value of @v to @i.
>   */
> -#define atomic_set(v, i)		((v)->counter = (i))
> +#define atomic_set(v, i)	WRITE_ONCE((v)->counter, (i))
> 
>  #define ATOMIC_OP(op, c_op, asm_op)					      \
>  static __inline__ void atomic_##op(int i, atomic_t * v)			      \
> @@ -315,14 +315,14 @@ static __inline__ int __atomic_add_unless(atomic_t *v, int a, int u)
>   * @v: pointer of type atomic64_t
>   *
>   */
> -#define atomic64_read(v)	ACCESS_ONCE((v)->counter)
> +#define atomic64_read(v)	READ_ONCE((v)->counter)
> 
>  /*
>   * atomic64_set - set atomic variable
>   * @v: pointer of type atomic64_t
>   * @i: required value
>   */
> -#define atomic64_set(v, i)	((v)->counter = (i))
> +#define atomic64_set(v, i)	WRITE_ONCE((v)->counter, (i))
> 
>  #define ATOMIC64_OP(op, c_op, asm_op)					      \
>  static __inline__ void atomic64_##op(long i, atomic64_t * v)		      \
> diff --git a/arch/mn10300/include/asm/atomic.h b/arch/mn10300/include/asm/atomic.h
> index 375e59140c9c..ce318d5ab23b 100644
> --- a/arch/mn10300/include/asm/atomic.h
> +++ b/arch/mn10300/include/asm/atomic.h
> @@ -34,7 +34,7 @@
>   *
>   * Atomically reads the value of @v.  Note that the guaranteed
>   */
> -#define atomic_read(v)	(ACCESS_ONCE((v)->counter))
> +#define atomic_read(v)	READ_ONCE((v)->counter)
> 
>  /**
>   * atomic_set - set atomic variable
> @@ -43,7 +43,7 @@
>   *
>   * Atomically sets the value of @v to @i.  Note that the guaranteed
>   */
> -#define atomic_set(v, i) (((v)->counter) = (i))
> +#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
> 
>  #define ATOMIC_OP(op)							\
>  static inline void atomic_##op(int i, atomic_t *v)			\
> diff --git a/arch/parisc/include/asm/atomic.h b/arch/parisc/include/asm/atomic.h
> index 2536965d00ea..1d109990a022 100644
> --- a/arch/parisc/include/asm/atomic.h
> +++ b/arch/parisc/include/asm/atomic.h
> @@ -67,7 +67,7 @@ static __inline__ void atomic_set(atomic_t *v, int i)
> 
>  static __inline__ int atomic_read(const atomic_t *v)
>  {
> -	return ACCESS_ONCE((v)->counter);
> +	return READ_ONCE((v)->counter);
>  }
> 
>  /* exported interface */
> diff --git a/arch/sh/include/asm/atomic.h b/arch/sh/include/asm/atomic.h
> index 05b9f74ce2d5..c399e1c55685 100644
> --- a/arch/sh/include/asm/atomic.h
> +++ b/arch/sh/include/asm/atomic.h
> @@ -14,8 +14,8 @@
> 
>  #define ATOMIC_INIT(i)	{ (i) }
> 
> -#define atomic_read(v)		ACCESS_ONCE((v)->counter)
> -#define atomic_set(v,i)		((v)->counter = (i))
> +#define atomic_read(v)		READ_ONCE((v)->counter)
> +#define atomic_set(v,i)		WRITE_ONCE((v)->counter, (i))
> 
>  #if defined(CONFIG_GUSA_RB)
>  #include <asm/atomic-grb.h>
> diff --git a/arch/sparc/include/asm/atomic_64.h b/arch/sparc/include/asm/atomic_64.h
> index 917084ace49d..f2fbf9e16faf 100644
> --- a/arch/sparc/include/asm/atomic_64.h
> +++ b/arch/sparc/include/asm/atomic_64.h
> @@ -14,11 +14,11 @@
>  #define ATOMIC_INIT(i)		{ (i) }
>  #define ATOMIC64_INIT(i)	{ (i) }
> 
> -#define atomic_read(v)		ACCESS_ONCE((v)->counter)
> -#define atomic64_read(v)	ACCESS_ONCE((v)->counter)
> +#define atomic_read(v)		READ_ONCE((v)->counter)
> +#define atomic64_read(v)	READ_ONCE((v)->counter)
> 
> -#define atomic_set(v, i)	(((v)->counter) = i)
> -#define atomic64_set(v, i)	(((v)->counter) = i)
> +#define atomic_set(v, i)	WRITE_ONCE(((v)->counter), (i))
> +#define atomic64_set(v, i)	WRITE_ONCE(((v)->counter), (i))
> 
>  #define ATOMIC_OP(op)							\
>  void atomic_##op(int, atomic_t *);					\
> diff --git a/arch/tile/include/asm/atomic.h b/arch/tile/include/asm/atomic.h
> index 709798460763..9fc0107a9c5e 100644
> --- a/arch/tile/include/asm/atomic.h
> +++ b/arch/tile/include/asm/atomic.h
> @@ -34,7 +34,7 @@
>   */
>  static inline int atomic_read(const atomic_t *v)
>  {
> -	return ACCESS_ONCE(v->counter);
> +	return READ_ONCE(v->counter);
>  }
> 
>  /**
> diff --git a/arch/tile/include/asm/atomic_64.h b/arch/tile/include/asm/atomic_64.h
> index 096a56d6ead4..51cabc26e387 100644
> --- a/arch/tile/include/asm/atomic_64.h
> +++ b/arch/tile/include/asm/atomic_64.h
> @@ -24,7 +24,7 @@
> 
>  /* First, the 32-bit atomic ops that are "real" on our 64-bit platform. */
> 
> -#define atomic_set(v, i) ((v)->counter = (i))
> +#define atomic_set(v, i) WRITE_ONCE((v)->counter, (i))
> 
>  /*
>   * The smp_mb() operations throughout are to support the fact that
> @@ -82,8 +82,8 @@ static inline void atomic_xor(int i, atomic_t *v)
> 
>  #define ATOMIC64_INIT(i)	{ (i) }
> 
> -#define atomic64_read(v)		((v)->counter)
> -#define atomic64_set(v, i) ((v)->counter = (i))
> +#define atomic64_read(v)	READ_ONCE((v)->counter)
> +#define atomic64_set(v, i)	WRITE_ONCE((v)->counter, (i))
> 
>  static inline void atomic64_add(long i, atomic64_t *v)
>  {
> diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
> index fb52aa644aab..ae5fb83e6d91 100644
> --- a/arch/x86/include/asm/atomic.h
> +++ b/arch/x86/include/asm/atomic.h
> @@ -24,7 +24,7 @@
>   */
>  static __always_inline int atomic_read(const atomic_t *v)
>  {
> -	return ACCESS_ONCE((v)->counter);
> +	return READ_ONCE((v)->counter);
>  }
> 
>  /**
> @@ -36,7 +36,7 @@ static __always_inline int atomic_read(const atomic_t *v)
>   */
>  static __always_inline void atomic_set(atomic_t *v, int i)
>  {
> -	v->counter = i;
> +	WRITE_ONCE(v->counter, i);
>  }
> 
>  /**
> diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
> index 50e33eff58de..037351022f54 100644
> --- a/arch/x86/include/asm/atomic64_64.h
> +++ b/arch/x86/include/asm/atomic64_64.h
> @@ -18,7 +18,7 @@
>   */
>  static inline long atomic64_read(const atomic64_t *v)
>  {
> -	return ACCESS_ONCE((v)->counter);
> +	return READ_ONCE((v)->counter);
>  }
> 
>  /**
> @@ -30,7 +30,7 @@ static inline long atomic64_read(const atomic64_t *v)
>   */
>  static inline void atomic64_set(atomic64_t *v, long i)
>  {
> -	v->counter = i;
> +	WRITE_ONCE(v->counter, i);
>  }
> 
>  /**
> diff --git a/arch/xtensa/include/asm/atomic.h b/arch/xtensa/include/asm/atomic.h
> index 93795d047303..fd8017ce298a 100644
> --- a/arch/xtensa/include/asm/atomic.h
> +++ b/arch/xtensa/include/asm/atomic.h
> @@ -47,7 +47,7 @@
>   *
>   * Atomically reads the value of @v.
>   */
> -#define atomic_read(v)		ACCESS_ONCE((v)->counter)
> +#define atomic_read(v)		READ_ONCE((v)->counter)
> 
>  /**
>   * atomic_set - set atomic variable
> @@ -56,7 +56,7 @@
>   *
>   * Atomically sets the value of @v to @i.
>   */
> -#define atomic_set(v,i)		((v)->counter = (i))
> +#define atomic_set(v,i)		WRITE_ONCE((v)->counter, (i))
> 
>  #if XCHAL_HAVE_S32C1I
>  #define ATOMIC_OP(op)							\
> diff --git a/include/asm-generic/atomic.h b/include/asm-generic/atomic.h
> index d4d7e337fdcb..74f1a3704d7a 100644
> --- a/include/asm-generic/atomic.h
> +++ b/include/asm-generic/atomic.h
> @@ -127,7 +127,7 @@ ATOMIC_OP(xor, ^)
>   * Atomically reads the value of @v.
>   */
>  #ifndef atomic_read
> -#define atomic_read(v)	ACCESS_ONCE((v)->counter)
> +#define atomic_read(v)	READ_ONCE((v)->counter)
>  #endif
> 
>  /**
> @@ -137,7 +137,7 @@ ATOMIC_OP(xor, ^)
>   *
>   * Atomically sets the value of @v to @i.
>   */
> -#define atomic_set(v, i) (((v)->counter) = (i))
> +#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
> 
>  #include <linux/irqflags.h>
> 
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18  9:28               ` Will Deacon
  2015-09-18  9:33                 ` Peter Zijlstra
  2015-09-18 11:22                 ` Peter Zijlstra
@ 2015-09-18 15:57                 ` Paul E. McKenney
  2 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2015-09-18 15:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: Dmitry Vyukov, Peter Zijlstra, Oleg Nesterov, ebiederm, Al Viro,
	Andrew Morton, Ingo Molnar, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, Sep 18, 2015 at 10:28:20AM +0100, Will Deacon wrote:
> On Fri, Sep 18, 2015 at 10:06:46AM +0100, Dmitry Vyukov wrote:
> > On Fri, Sep 18, 2015 at 10:51 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > As an alternative patch, could we not do:
> > >
> > >   void put_pid(struct pid *pid)
> > >   {
> > >         struct pid_namespace *ns;
> > >
> > >         if (!pid)
> > >                 return;
> > >
> > >         ns = pid->numbers[pid->level].ns;
> > >         if ((atomic_read(&pid->count) == 1) ||
> > >              atomic_dec_and_test(&pid->count)) {
> > >
> > > +               smp_read_barrier_depends(); /* ctrl-dep */
> > >
> > >                 kmem_cache_free(ns->pid_cachep, pid);
> > >                 put_pid_ns(ns);
> > >         }
> > >   }
> > >
> > > That would upgrade the atomic_read() path to a full READ_ONCE_CTRL(),
> > > and thereby avoid any of the kmem_cache_free() stores from leaking out.
> > > And its free, except on Alpha. Whereas the atomic_read_acquire() will
> > > generate a full memory barrier on whole bunch of archs.
> > 
> > 
> > What you propose makes sense.
> > 
> > +Will, Paul
> > 
> > Can we have something along the lines of:
> > 
> > #define atomic_read_ctrl(v) READ_ONCE_CTRL(&(v)->counter)
> 
> Funnily enough, I had this exact same discussion off-list yesterday
> afternoon, since I wrote some code relying on a ctrl dependency from
> an atomic_read to an atomic_xchg_relaxed.
> 
> So I guess I'm for the addition, but at the same time, could we make
> atomic_read and atomic_set generic too?

If we have some places that could use it, by all means let's add it!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 13:46               ` Peter Zijlstra
  2015-09-18 15:00                 ` Oleg Nesterov
  2015-09-18 15:30                 ` Oleg Nesterov
@ 2015-09-18 16:00                 ` Paul E. McKenney
  2 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2015-09-18 16:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Dmitry Vyukov, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, mhocko, LKML, ktsan, Kostya Serebryany,
	Andrey Konovalov, Alexander Potapenko, Hans Boehm

On Fri, Sep 18, 2015 at 03:46:30PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 18, 2015 at 03:28:44PM +0200, Oleg Nesterov wrote:
> > On 09/18, Peter Zijlstra wrote:
> > >
> > > On Thu, Sep 17, 2015 at 08:09:19PM +0200, Oleg Nesterov wrote:
> > >
> > > > I need to recheck, but afaics this is not possible. This optimization
> > > > is fine, but probably needs a comment.
> > >
> > > For sure, this code doesn't make any sense to me.
> > 
> > So yes, after a sleep I am starting to agree that in theory this fast-path
> > check is wrong. I'll write another email..
> 
> This other mail will include a patch adding comments to pid.c ? That
> code didn't want to make sense to me this morning.
> 
> > > As an alternative patch, could we not do:
> > >
> > >   void put_pid(struct pid *pid)
> > >   {
> > > 	struct pid_namespace *ns;
> > >
> > > 	if (!pid)
> > > 		return;
> > >
> > > 	ns = pid->numbers[pid->level].ns;
> > > 	if ((atomic_read(&pid->count) == 1) ||
> > > 	     atomic_dec_and_test(&pid->count)) {
> > >
> > > +		smp_read_barrier_depends(); /* ctrl-dep */
> > 
> > Not sure... Firstly it is not clear what this barrier pairs with. And I
> > have to admit that I can not understand if _CTRL() logic applies here.
> > The same for atomic_read_ctrl().
> 
> The control dependency barrier pairs with the full barrier of
> atomic_dec_and_test.
> 
> So the two put_pid() instances:
> 
> 	CPU0					CPU1
> 
> 	pid->foo = 1;
> 	atomic_dec_and_test() == false		atomic_read_ctrl() == 1
> 						kmem_cache_free(pid)
> 
> CPU0 will modify a pid field and decrement, but not reach 0.
> CPU1 finds we're the last, but must also be able to observe our foo
> store such that we can rest assured  it is complete before we free the
> storage.
> 
> The freeing of pid, on CPU1, is stores, these must not happen before we
> satisfy the freeing condition, iow a load-store barrier, which is what
> the control dependency provides.
> 
> > OK, please forget about put_pid() for the moment. Suppose we have
> > 
> > 	X = 1;
> > 	synchronize_sched();
> > 	Y = 1;
> > 
> > Or
> > 	X = 1;
> > 	call_rcu_sched( func => { Y = 1; } );
> > 
> > 
> > 
> > Now. In theory this this code is wrong:
> > 
> > 	if (Y) {
> > 		BUG_ON(X == 0);
> > 	}
> > 
> > But this is correct:
> > 
> > 	if (Y) {
> > 		rcu_read_lock_sched();
> > 		rcu_read_unlock_sched();
> > 		BUG_ON(X == 0);
> > 	}
> > 
> > So perhaps something like this
> > 
> > 	/*
> > 	 * Comment to explain it is eq to read_lock + read_unlock,
> > 	 * in a sense that this guarantees a full barrier wrt to
> > 	 * the previous synchronize_sched().
> > 	 */
> > 	#define rcu_read_barrier_sched()	barrier()
> > 
> > make sense?
> > 
> > 
> > And again, I simply can't understand if this code
> > 
> > 	if (READ_ONCE_CTRL(Y))
> > 		BUG_ON(X == 0);
> > 
> > to me it does _not_ look correct in theory.
> 
> So control dependencies provide a load-store barrier. Your examples
> above rely on a load-load barrier; BUG_ON(X == 0) is a load.
> 
> kmem_cache_free() OTOH is stores (we must modify the free list).

And any reads are bogus, so ordering with writes suffices.  Good!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 11:22                 ` Peter Zijlstra
  2015-09-18 11:30                   ` Will Deacon
  2015-09-18 11:50                   ` Dmitry Vyukov
@ 2015-09-18 16:15                   ` Eric Dumazet
  2015-09-18 16:20                     ` Peter Zijlstra
  2 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2015-09-18 16:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Dmitry Vyukov, Oleg Nesterov, ebiederm, Al Viro,
	Andrew Morton, Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, 2015-09-18 at 13:22 +0200, Peter Zijlstra wrote:

...

>  
> +#ifndef atomic_read_ctrl
> +static inline int atomic_read_ctrl(atomic_t *v)

const atomic_t *v

> +{
> +	int val = atomic_read(v);
> +	smp_read_barrier_depends(); /* Enforce control dependency. */
> +	return val;
> +}
> +#endif




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 16:15                   ` [PATCH] kernel: fix data race in put_pid Eric Dumazet
@ 2015-09-18 16:20                     ` Peter Zijlstra
  0 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2015-09-18 16:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Will Deacon, Dmitry Vyukov, Oleg Nesterov, ebiederm, Al Viro,
	Andrew Morton, Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

On Fri, Sep 18, 2015 at 09:15:34AM -0700, Eric Dumazet wrote:
> On Fri, 2015-09-18 at 13:22 +0200, Peter Zijlstra wrote:

> > +static inline int atomic_read_ctrl(atomic_t *v)
> 
> const atomic_t *v

Good point, lemme also fix up ATOMIC_LONG_READ_OP().

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] kernel: fix data race in put_pid
  2015-09-18 11:56                     ` Peter Zijlstra
  2015-09-18 12:19                       ` Peter Zijlstra
  2015-09-18 13:44                       ` Oleg Nesterov
@ 2015-09-22  8:38                       ` Dmitry Vyukov
  2015-09-23  8:48                       ` [tip:locking/core] atomic: Implement atomic_read_ctrl() tip-bot for Peter Zijlstra
  3 siblings, 0 replies; 40+ messages in thread
From: Dmitry Vyukov @ 2015-09-22  8:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Oleg Nesterov, ebiederm, Al Viro, Andrew Morton,
	Ingo Molnar, Paul McKenney, mhocko, LKML, ktsan,
	Kostya Serebryany, Andrey Konovalov, Alexander Potapenko,
	Hans Boehm

Should I wait until atomic_read_ctrl patch is landed and update my patch?


On Fri, Sep 18, 2015 at 1:56 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Sep 18, 2015 at 01:50:01PM +0200, Dmitry Vyukov wrote:
>> > +#ifndef atomic_read_ctrl
>> > +static inline int atomic_read_ctrl(atomic_t *v)
>> > +{
>> > +       int val = atomic_read(v);
>> > +       smp_read_barrier_depends(); /* Enforce control dependency. */
>> > +       return val;
>> > +}
>> > +#endif
>> > +
>> >  /*
>> >   * Relaxed variants of xchg, cmpxchg and some atomic operations.
>> >   *
>>
>> Looks good to me.
>> Should we add atomic64_read_ctrl for completeness? I have not seen
>> cases where it was needed, though.
>
> Sure, and while doing another spin, let me go update the documentation
> too.
>
> ---
> Subject: atomic: Implement atomic_read_ctrl()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Fri, 18 Sep 2015 13:22:52 +0200
>
> Provide atomic_read_ctrl() to mirror READ_ONCE_CTRL(), such that we can
> more conveniently use atomics in control dependencies.
>
> Since we can assume atomic_read() implies a READ_ONCE(), we must only
> emit an extra smp_read_barrier_depends() in order to upgrade to
> READ_ONCE_CTRL() semantics.
>
> Cc: oleg@redhat.com
> Cc: torvalds@linux-foundation.org
> Cc: will.deacon@arm.com
> Cc: paulmck@linux.vnet.ibm.com
> Requested-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  Documentation/memory-barriers.txt |   17 +++++++++--------
>  include/linux/atomic.h            |   18 ++++++++++++++++++
>  2 files changed, 27 insertions(+), 8 deletions(-)
>
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -637,7 +637,8 @@ to optimize the original example by elim
>         b = p;  /* BUG: Compiler and CPU can both reorder!!! */
>
>  Finally, the READ_ONCE_CTRL() includes an smp_read_barrier_depends()
> -that DEC Alpha needs in order to respect control depedencies.
> +that DEC Alpha needs in order to respect control depedencies. Alternatively
> +use one of atomic{,64}_read_ctrl().
>
>  So don't leave out the READ_ONCE_CTRL().
>
> @@ -796,9 +797,9 @@ site: https://www.cl.cam.ac.uk/~pes20/pp
>
>  In summary:
>
> -  (*) Control dependencies must be headed by READ_ONCE_CTRL().
> -      Or, as a much less preferable alternative, interpose
> -      smp_read_barrier_depends() between a READ_ONCE() and the
> +  (*) Control dependencies must be headed by READ_ONCE_CTRL(),
> +      atomic{,64}_read_ctrl(). Or, as a much less preferable alternative,
> +      interpose smp_read_barrier_depends() between a READ_ONCE() and the
>        control-dependent write.
>
>    (*) Control dependencies can order prior loads against later stores.
> @@ -820,10 +821,10 @@ site: https://www.cl.cam.ac.uk/~pes20/pp
>        and WRITE_ONCE() can help to preserve the needed conditional.
>
>    (*) Control dependencies require that the compiler avoid reordering the
> -      dependency into nonexistence.  Careful use of READ_ONCE_CTRL()
> -      or smp_read_barrier_depends() can help to preserve your control
> -      dependency.  Please see the Compiler Barrier section for more
> -      information.
> +      dependency into nonexistence.  Careful use of READ_ONCE_CTRL(),
> +      atomic{,64}_read_ctrl() or smp_read_barrier_depends() can help to
> +      preserve your control dependency.  Please see the Compiler Barrier
> +      section for more information.
>
>    (*) Control dependencies pair normally with other types of barriers.
>
> --- a/include/linux/atomic.h
> +++ b/include/linux/atomic.h
> @@ -4,6 +4,24 @@
>  #include <asm/atomic.h>
>  #include <asm/barrier.h>
>
> +#ifndef atomic_read_ctrl
> +static inline int atomic_read_ctrl(atomic_t *v)
> +{
> +       int val = atomic_read(v);
> +       smp_read_barrier_depends(); /* Enforce control dependency. */
> +       return val;
> +}
> +#endif
> +
> +#ifndef atomic64_read_ctrl
> +static inline int atomic64_read_ctrl(atomic64_t *v)
> +{
> +       int val = atomic64_read(v);
> +       smp_read_barrier_depends(); /* Enforce control dependency. */
> +       return val;
> +}
> +#endif
> +
>  /*
>   * Relaxed variants of xchg, cmpxchg and some atomic operations.
>   *



-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [tip:locking/core] atomic: Implement atomic_read_ctrl()
  2015-09-18 11:56                     ` Peter Zijlstra
                                         ` (2 preceding siblings ...)
  2015-09-22  8:38                       ` Dmitry Vyukov
@ 2015-09-23  8:48                       ` tip-bot for Peter Zijlstra
  3 siblings, 0 replies; 40+ messages in thread
From: tip-bot for Peter Zijlstra @ 2015-09-23  8:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, linux-kernel, tglx, peterz, paulmck, will.deacon, hpa,
	dvyukov, mingo, akpm

Commit-ID:  e3e72ab80a3fac0b88e07d358a2c75724ccd66b4
Gitweb:     http://git.kernel.org/tip/e3e72ab80a3fac0b88e07d358a2c75724ccd66b4
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 18 Sep 2015 13:22:52 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 23 Sep 2015 09:54:29 +0200

atomic: Implement atomic_read_ctrl()

Provide atomic_read_ctrl() to mirror READ_ONCE_CTRL(), such that we can
more conveniently use atomics in control dependencies.

Since we can assume atomic_read() implies a READ_ONCE(), we must only
emit an extra smp_read_barrier_depends() in order to upgrade to
READ_ONCE_CTRL() semantics.

Requested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/20150918115637.GM3604@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/memory-barriers.txt | 17 +++++++++--------
 include/asm-generic/atomic-long.h |  3 ++-
 include/linux/atomic.h            | 18 ++++++++++++++++++
 3 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 2ba8461..41ffd7e 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -637,7 +637,8 @@ as follows:
 	b = p;  /* BUG: Compiler and CPU can both reorder!!! */
 
 Finally, the READ_ONCE_CTRL() includes an smp_read_barrier_depends()
-that DEC Alpha needs in order to respect control depedencies.
+that DEC Alpha needs in order to respect control depedencies. Alternatively
+use one of atomic{,64}_read_ctrl().
 
 So don't leave out the READ_ONCE_CTRL().
 
@@ -796,9 +797,9 @@ site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html.
 
 In summary:
 
-  (*) Control dependencies must be headed by READ_ONCE_CTRL().
-      Or, as a much less preferable alternative, interpose
-      smp_read_barrier_depends() between a READ_ONCE() and the
+  (*) Control dependencies must be headed by READ_ONCE_CTRL(),
+      atomic{,64}_read_ctrl(). Or, as a much less preferable alternative,
+      interpose smp_read_barrier_depends() between a READ_ONCE() and the
       control-dependent write.
 
   (*) Control dependencies can order prior loads against later stores.
@@ -820,10 +821,10 @@ In summary:
       and WRITE_ONCE() can help to preserve the needed conditional.
 
   (*) Control dependencies require that the compiler avoid reordering the
-      dependency into nonexistence.  Careful use of READ_ONCE_CTRL()
-      or smp_read_barrier_depends() can help to preserve your control
-      dependency.  Please see the Compiler Barrier section for more
-      information.
+      dependency into nonexistence.  Careful use of READ_ONCE_CTRL(),
+      atomic{,64}_read_ctrl() or smp_read_barrier_depends() can help to
+      preserve your control dependency.  Please see the Compiler Barrier
+      section for more information.
 
   (*) Control dependencies pair normally with other types of barriers.
 
diff --git a/include/asm-generic/atomic-long.h b/include/asm-generic/atomic-long.h
index c7a5c1a..8942cdc 100644
--- a/include/asm-generic/atomic-long.h
+++ b/include/asm-generic/atomic-long.h
@@ -35,7 +35,7 @@ typedef atomic_t atomic_long_t;
 #endif
 
 #define ATOMIC_LONG_READ_OP(mo)						\
-static inline long atomic_long_read##mo(atomic_long_t *l)		\
+static inline long atomic_long_read##mo(const atomic_long_t *l)		\
 {									\
 	ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;		\
 									\
@@ -43,6 +43,7 @@ static inline long atomic_long_read##mo(atomic_long_t *l)		\
 }
 ATOMIC_LONG_READ_OP()
 ATOMIC_LONG_READ_OP(_acquire)
+ATOMIC_LONG_READ_OP(_ctrl)
 
 #undef ATOMIC_LONG_READ_OP
 
diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 29dafa1..e326469 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -4,6 +4,15 @@
 #include <asm/atomic.h>
 #include <asm/barrier.h>
 
+#ifndef atomic_read_ctrl
+static inline int atomic_read_ctrl(const atomic_t *v)
+{
+	int val = atomic_read(v);
+	smp_read_barrier_depends(); /* Enforce control dependency. */
+	return val;
+}
+#endif
+
 /*
  * Relaxed variants of xchg, cmpxchg and some atomic operations.
  *
@@ -455,6 +464,15 @@ static inline int atomic_dec_if_positive(atomic_t *v)
 #include <asm-generic/atomic64.h>
 #endif
 
+#ifndef atomic64_read_ctrl
+static inline long long atomic64_read_ctrl(const atomic64_t *v)
+{
+	long long val = atomic64_read(v);
+	smp_read_barrier_depends(); /* Enforce control dependency. */
+	return val;
+}
+#endif
+
 #ifndef atomic64_andnot
 static inline void atomic64_andnot(long long i, atomic64_t *v)
 {

^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2015-09-23  8:49 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-17 13:24 [PATCH] kernel: fix data race in put_pid Dmitry Vyukov
2015-09-17 16:08 ` Oleg Nesterov
2015-09-17 16:41   ` Dmitry Vyukov
2015-09-17 17:44     ` Oleg Nesterov
2015-09-17 17:57       ` Dmitry Vyukov
2015-09-17 17:59         ` Dmitry Vyukov
2015-09-17 18:09         ` Oleg Nesterov
2015-09-17 18:38           ` Dmitry Vyukov
2015-09-18  8:51           ` Peter Zijlstra
2015-09-18  8:57             ` Peter Zijlstra
2015-09-18  9:27               ` Peter Zijlstra
2015-09-18 12:31                 ` James Hogan
2015-09-18 12:34                   ` Peter Zijlstra
2015-09-18 15:56               ` Paul E. McKenney
2015-09-18  9:06             ` Dmitry Vyukov
2015-09-18  9:28               ` Will Deacon
2015-09-18  9:33                 ` Peter Zijlstra
2015-09-18 11:22                 ` Peter Zijlstra
2015-09-18 11:30                   ` Will Deacon
2015-09-18 11:50                   ` Dmitry Vyukov
2015-09-18 11:56                     ` Peter Zijlstra
2015-09-18 12:19                       ` Peter Zijlstra
2015-09-18 12:44                         ` Will Deacon
2015-09-18 13:10                           ` Peter Zijlstra
2015-09-18 13:44                       ` Oleg Nesterov
2015-09-18 13:49                         ` Peter Zijlstra
2015-09-18 13:53                         ` Dmitry Vyukov
2015-09-18 14:41                           ` Oleg Nesterov
2015-09-22  8:38                       ` Dmitry Vyukov
2015-09-23  8:48                       ` [tip:locking/core] atomic: Implement atomic_read_ctrl() tip-bot for Peter Zijlstra
2015-09-18 16:15                   ` [PATCH] kernel: fix data race in put_pid Eric Dumazet
2015-09-18 16:20                     ` Peter Zijlstra
2015-09-18 15:57                 ` Paul E. McKenney
2015-09-18 13:28             ` Oleg Nesterov
2015-09-18 13:31               ` Oleg Nesterov
2015-09-18 13:46               ` Peter Zijlstra
2015-09-18 15:00                 ` Oleg Nesterov
2015-09-18 15:30                 ` Oleg Nesterov
2015-09-18 16:00                 ` Paul E. McKenney
2015-09-17 17:56     ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.