All of lore.kernel.org
 help / color / mirror / Atom feed
* Question regarding "Control Dependencies" in memory-barriers.txt
@ 2014-08-04 17:07 Pranith Kumar
  2014-08-04 18:52 ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Pranith Kumar @ 2014-08-04 17:07 UTC (permalink / raw)
  To: Paul McKenney, Peter Zijlstra, LKML

The section "Control Dependencies" in memory-barriers.txt has the
following text:

662 In addition, you need to be careful what you do with the local variable 'q',
663 otherwise the compiler might be able to guess the value and again remove
664 the needed conditional.  For example:
665
666         q = ACCESS_ONCE(a);
667         if (q % MAX) {
668                 barrier();
669                 ACCESS_ONCE(b) = p;
670                 do_something();
671         } else {
672                 barrier();
673                 ACCESS_ONCE(b) = p;
674                 do_something_else();
675         }
676
677 If MAX is defined to be 1, then the compiler knows that (q % MAX) is
678 equal to zero, in which case the compiler is within its rights to
679 transform the above code into the following:
680
681         q = ACCESS_ONCE(a);
682         ACCESS_ONCE(b) = p;
683         do_something_else();

Given that there is an explicit barrier() in both the branches of
if/else statement, how can the above transformation happen? The
compiler cannot just remove the barrier(), right?

I think it will transform to the following if MAX is defined to 1:

q = ACCESS_ONCE(a);
barrier();
ACCESS_ONCE(b) = p;
do_something_else();

and hence the ordering will be preserved. What am I missing here?

-- 
Pranith

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question regarding "Control Dependencies" in memory-barriers.txt
  2014-08-04 17:07 Question regarding "Control Dependencies" in memory-barriers.txt Pranith Kumar
@ 2014-08-04 18:52 ` Paul E. McKenney
  2014-08-04 21:03   ` Pranith Kumar
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2014-08-04 18:52 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: Peter Zijlstra, LKML

On Mon, Aug 04, 2014 at 01:07:47PM -0400, Pranith Kumar wrote:
> The section "Control Dependencies" in memory-barriers.txt has the
> following text:
> 
> 662 In addition, you need to be careful what you do with the local variable 'q',
> 663 otherwise the compiler might be able to guess the value and again remove
> 664 the needed conditional.  For example:
> 665
> 666         q = ACCESS_ONCE(a);
> 667         if (q % MAX) {
> 668                 barrier();
> 669                 ACCESS_ONCE(b) = p;
> 670                 do_something();
> 671         } else {
> 672                 barrier();
> 673                 ACCESS_ONCE(b) = p;
> 674                 do_something_else();
> 675         }
> 676
> 677 If MAX is defined to be 1, then the compiler knows that (q % MAX) is
> 678 equal to zero, in which case the compiler is within its rights to
> 679 transform the above code into the following:
> 680
> 681         q = ACCESS_ONCE(a);
> 682         ACCESS_ONCE(b) = p;
> 683         do_something_else();
> 
> Given that there is an explicit barrier() in both the branches of
> if/else statement, how can the above transformation happen? The
> compiler cannot just remove the barrier(), right?

No, the compiler cannot just remove the barrier().  However, it can
notice that "q % MAX" is always zero, which allows it to throw away
the then-clause entirely.

> I think it will transform to the following if MAX is defined to 1:
> 
> q = ACCESS_ONCE(a);
> barrier();
> ACCESS_ONCE(b) = p;
> do_something_else();

Good point, the "barrier()" must be retained, but...

> and hence the ordering will be preserved. What am I missing here?

Because the barrier() primitive affects only the compiler, the CPU
can still reorder things.

In contrast, in the original, the control dependency implied by the "if"
statement prevents the CPU from reordering.

I fixed the example to retain the barrier() with your Reported-by.

						Thanx, Paul


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question regarding "Control Dependencies" in memory-barriers.txt
  2014-08-04 18:52 ` Paul E. McKenney
@ 2014-08-04 21:03   ` Pranith Kumar
  2014-08-05  7:32     ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Pranith Kumar @ 2014-08-04 21:03 UTC (permalink / raw)
  To: Paul McKenney; +Cc: Peter Zijlstra, LKML

On Mon, Aug 4, 2014 at 2:52 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>>
>> Given that there is an explicit barrier() in both the branches of
>> if/else statement, how can the above transformation happen? The
>> compiler cannot just remove the barrier(), right?
>
> No, the compiler cannot just remove the barrier().  However, it can
> notice that "q % MAX" is always zero, which allows it to throw away
> the then-clause entirely.
>
>> I think it will transform to the following if MAX is defined to 1:
>>
>> q = ACCESS_ONCE(a);
>> barrier();
>> ACCESS_ONCE(b) = p;
>> do_something_else();
>
> Good point, the "barrier()" must be retained, but...
>
>> and hence the ordering will be preserved. What am I missing here?
>
> Because the barrier() primitive affects only the compiler, the CPU
> can still reorder things.
>
> In contrast, in the original, the control dependency implied by the "if"
> statement prevents the CPU from reordering.
>
> I fixed the example to retain the barrier() with your Reported-by.
>

Thank you for fixing it and explaining this. I have one related
question. Just after the above piece of text, there is the following:

685 This transformation loses the ordering between the load from variable 'a'
686 and the store to variable 'b'.  If you are relying on this ordering, you
687 should do something like the following:
688
689         q = ACCESS_ONCE(a);
690         BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
691         if (q % MAX) {
692                 ACCESS_ONCE(b) = p;
693                 do_something();
694         } else {
695                 ACCESS_ONCE(b) = p;
696                 do_something_else();
697         }
698

How is the BUILD_BUG_ON(MAX <= 1) guaranteeing the ordering w.r.t 'a'
and 'b'. Shouldn't it have barrier() in both the legs of the if()
statement like follows:

@@ -689,9 +689,11 @@ should do something like the following:
        q = ACCESS_ONCE(a);
        BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
        if (q % MAX) {
+               barrier();
                ACCESS_ONCE(b) = p;
                do_something();
        } else {
+               barrier();
                ACCESS_ONCE(b) = p;
                do_something_else();
        }


-- 
Pranith

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question regarding "Control Dependencies" in memory-barriers.txt
  2014-08-04 21:03   ` Pranith Kumar
@ 2014-08-05  7:32     ` Peter Zijlstra
  2014-08-05 12:13       ` Pranith Kumar
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2014-08-05  7:32 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: Paul McKenney, LKML

On Mon, Aug 04, 2014 at 05:03:00PM -0400, Pranith Kumar wrote:
> On Mon, Aug 4, 2014 at 2:52 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >>
> >> Given that there is an explicit barrier() in both the branches of
> >> if/else statement, how can the above transformation happen? The
> >> compiler cannot just remove the barrier(), right?
> >
> > No, the compiler cannot just remove the barrier().  However, it can
> > notice that "q % MAX" is always zero, which allows it to throw away
> > the then-clause entirely.
> >
> >> I think it will transform to the following if MAX is defined to 1:
> >>
> >> q = ACCESS_ONCE(a);
> >> barrier();
> >> ACCESS_ONCE(b) = p;
> >> do_something_else();
> >
> > Good point, the "barrier()" must be retained, but...
> >
> >> and hence the ordering will be preserved. What am I missing here?
> >
> > Because the barrier() primitive affects only the compiler, the CPU
> > can still reorder things.
> >
> > In contrast, in the original, the control dependency implied by the "if"
> > statement prevents the CPU from reordering.
> >
> > I fixed the example to retain the barrier() with your Reported-by.
> >
> 
> Thank you for fixing it and explaining this. I have one related
> question. Just after the above piece of text, there is the following:
> 
> 685 This transformation loses the ordering between the load from variable 'a'
> 686 and the store to variable 'b'.  If you are relying on this ordering, you
> 687 should do something like the following:
> 688
> 689         q = ACCESS_ONCE(a);
> 690         BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
> 691         if (q % MAX) {
> 692                 ACCESS_ONCE(b) = p;
> 693                 do_something();
> 694         } else {
> 695                 ACCESS_ONCE(b) = p;
> 696                 do_something_else();
> 697         }
> 698
> 
> How is the BUILD_BUG_ON(MAX <= 1) guaranteeing the ordering w.r.t 'a'
> and 'b'. Shouldn't it have barrier() in both the legs of the if()
> statement like follows:

The BUILD_BUG_ON guarantees that 'q % MAX' isn't a constant, and
therefore the compiler cannot take the conditional out.

And no barrier() doesn't order anything except compiler output. The
ordering here happens by the conditional, CPUs do not do speculative
writes, this means it has to complete the load of q to compute the
branch before it can execute the store of b.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question regarding "Control Dependencies" in memory-barriers.txt
  2014-08-05  7:32     ` Peter Zijlstra
@ 2014-08-05 12:13       ` Pranith Kumar
  2014-08-05 12:58         ` Peter Zijlstra
  2014-08-13 22:44         ` Paul E. McKenney
  0 siblings, 2 replies; 10+ messages in thread
From: Pranith Kumar @ 2014-08-05 12:13 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Paul McKenney, LKML

On Tue, Aug 5, 2014 at 3:32 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> 685 This transformation loses the ordering between the load from variable 'a'
>> 686 and the store to variable 'b'.  If you are relying on this ordering, you
>> 687 should do something like the following:
>> 688
>> 689         q = ACCESS_ONCE(a);
>> 690         BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
>> 691         if (q % MAX) {
>> 692                 ACCESS_ONCE(b) = p;
>> 693                 do_something();
>> 694         } else {
>> 695                 ACCESS_ONCE(b) = p;
>> 696                 do_something_else();
>> 697         }
>> 698
>>
>> How is the BUILD_BUG_ON(MAX <= 1) guaranteeing the ordering w.r.t 'a'
>> and 'b'. Shouldn't it have barrier() in both the legs of the if()
>> statement like follows:
>
> The BUILD_BUG_ON guarantees that 'q % MAX' isn't a constant, and
> therefore the compiler cannot take the conditional out.
>
> And no barrier() doesn't order anything except compiler output. The
> ordering here happens by the conditional, CPUs do not do speculative
> writes, this means it has to complete the load of q to compute the
> branch before it can execute the store of b.
>
>

I don't think the write to 'b' here is speculative since it is
happening in both the legs of the if() conditional. The write to b can
be pulled out to before the conditional. Without the barrier(), isn't
the following a valid transformation of the above?

BUILD_BUG_ON(MAX <= 1); /* this will be compiled out if MAX != 1*/
q = ACCESS_ONCE(a);
ACCESS_ONCE(b) = p; / *BUG: No ordering */
if (q % MAX) {
        do_something();
} else {
        do_something_else();
}

I don't see how it is preserving the ordering.

-- 
Pranith

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question regarding "Control Dependencies" in memory-barriers.txt
  2014-08-05 12:13       ` Pranith Kumar
@ 2014-08-05 12:58         ` Peter Zijlstra
  2014-08-13 22:44         ` Paul E. McKenney
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2014-08-05 12:58 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: Paul McKenney, LKML

[-- Attachment #1: Type: text/plain, Size: 1074 bytes --]

On Tue, Aug 05, 2014 at 08:13:54AM -0400, Pranith Kumar wrote:
> >> 689         q = ACCESS_ONCE(a);
> >> 690         BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
> >> 691         if (q % MAX) {
> >> 692                 ACCESS_ONCE(b) = p;
> >> 693                 do_something();
> >> 694         } else {
> >> 695                 ACCESS_ONCE(b) = p;
> >> 696                 do_something_else();
> >> 697         }
> >> 698


> I don't think the write to 'b' here is speculative since it is
> happening in both the legs of the if() conditional. The write to b can
> be pulled out to before the conditional. Without the barrier(), isn't
> the following a valid transformation of the above?
> 
> BUILD_BUG_ON(MAX <= 1); /* this will be compiled out if MAX != 1*/
> q = ACCESS_ONCE(a);
> ACCESS_ONCE(b) = p; / *BUG: No ordering */
> if (q % MAX) {
>         do_something();
> } else {
>         do_something_else();
> }
> 
> I don't see how it is preserving the ordering.

Ah, that's what you meant. Yes possibly that's true.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question regarding "Control Dependencies" in memory-barriers.txt
  2014-08-05 12:13       ` Pranith Kumar
  2014-08-05 12:58         ` Peter Zijlstra
@ 2014-08-13 22:44         ` Paul E. McKenney
  2014-08-14  0:10           ` Pranith Kumar
  1 sibling, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2014-08-13 22:44 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: Peter Zijlstra, LKML

On Tue, Aug 05, 2014 at 08:13:54AM -0400, Pranith Kumar wrote:
> On Tue, Aug 5, 2014 at 3:32 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >>
> >> 685 This transformation loses the ordering between the load from variable 'a'
> >> 686 and the store to variable 'b'.  If you are relying on this ordering, you
> >> 687 should do something like the following:
> >> 688
> >> 689         q = ACCESS_ONCE(a);
> >> 690         BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
> >> 691         if (q % MAX) {
> >> 692                 ACCESS_ONCE(b) = p;
> >> 693                 do_something();
> >> 694         } else {
> >> 695                 ACCESS_ONCE(b) = p;
> >> 696                 do_something_else();
> >> 697         }
> >> 698
> >>
> >> How is the BUILD_BUG_ON(MAX <= 1) guaranteeing the ordering w.r.t 'a'
> >> and 'b'. Shouldn't it have barrier() in both the legs of the if()
> >> statement like follows:
> >
> > The BUILD_BUG_ON guarantees that 'q % MAX' isn't a constant, and
> > therefore the compiler cannot take the conditional out.
> >
> > And no barrier() doesn't order anything except compiler output. The
> > ordering here happens by the conditional, CPUs do not do speculative
> > writes, this means it has to complete the load of q to compute the
> > branch before it can execute the store of b.
> >
> >
> 
> I don't think the write to 'b' here is speculative since it is
> happening in both the legs of the if() conditional. The write to b can
> be pulled out to before the conditional. Without the barrier(), isn't
> the following a valid transformation of the above?
> 
> BUILD_BUG_ON(MAX <= 1); /* this will be compiled out if MAX != 1*/
> q = ACCESS_ONCE(a);
> ACCESS_ONCE(b) = p; / *BUG: No ordering */
> if (q % MAX) {
>         do_something();
> } else {
>         do_something_else();
> }
> 
> I don't see how it is preserving the ordering.

Well, let's try it with a real compiler.  I am running gcc version 4.6.3
(yes, ancient).  Given the following:

	int a;
	int b;

	#define MAX 10
	#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
	#define barrier() __asm__ __volatile__("": : :"memory")

	void do_something(void);
	void do_something_else(void);

	void foo(void)
	{
		int q;

		q = ACCESS_ONCE(a);
		if (q % MAX) {
			ACCESS_ONCE(b) = q;
			do_something();
		} else {
			ACCESS_ONCE(b) = q;
			do_something_else();
		}
	}

cc -O3 -c controldep2.c gives the following:

		.file	"controldep2.c"
		.text
		.p2align 4,,15
		.globl	foo
		.type	foo, @function
	foo:
	.LFB0:
		.cfi_startproc
		movl	a, %ecx
		movl	$1717986919, %edx
		movl	%ecx, %eax
		imull	%edx
		movl	%ecx, %eax
		sarl	$31, %eax
		movl	%ecx, b
		sarl	$2, %edx
		subl	%eax, %edx
		leal	(%edx,%edx,4), %eax
		addl	%eax, %eax
		cmpl	%eax, %ecx
		jne	.L4
		jmp	do_something_else
		.p2align 4,,7
		.p2align 3
	.L4:
		jmp	do_something
		.cfi_endproc
	.LFE0:
		.size	foo, .-foo
		.comm	b,4,4
		.comm	a,4,4
		.ident	"GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
		.section	.note.GNU-stack,"",@progbits

As you can see, the assignment to "b" has been hoisted above the "if".
And adding barrier() to each leg of the "if" statement doesn't help,
the store to "b" still gets hoisted.

So this whole "q % MAX" example is bogus...

In fact, if you have the same store on both legs of the "if" statement,
your ordering appears to be toast, at least at high optimization levels.

You would think that I would have tested with -O3 initially.  :-(

Patch below.

							Thanx, Paul

------------------------------------------------------------------------

memory-barriers: Fix description of 2-legged-if-based control dependencies

Sad to say, current compilers really will hoist identical stores from both
branches of an "if" statement to precede the conditional.  This commit
therefore updates the description of control dependencies to reflect this
ugly reality.

Reported-by: Pranith Kumar <bobby.prani@gmail.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 600b45c6e2ad..6062f175abc6 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -574,30 +574,14 @@ However, stores are not speculated.  This means that ordering -is- provided
 in the following example:
 
 	q = ACCESS_ONCE(a);
-	if (ACCESS_ONCE(q)) {
-		ACCESS_ONCE(b) = p;
-	}
-
-Please note that ACCESS_ONCE() is not optional!  Without the ACCESS_ONCE(),
-the compiler is within its rights to transform this example:
-
-	q = a;
 	if (q) {
-		b = p;  /* BUG: Compiler can reorder!!! */
-		do_something();
-	} else {
-		b = p;  /* BUG: Compiler can reorder!!! */
-		do_something_else();
+		ACCESS_ONCE(b) = p;
 	}
 
-into this, which of course defeats the ordering:
-
-	b = p;
-	q = a;
-	if (q)
-		do_something();
-	else
-		do_something_else();
+Please note that ACCESS_ONCE() is not optional!  Without the
+ACCESS_ONCE(), might combine the load from 'a' with other loads from
+'a', and the store to 'b' with other stores to 'b', with possible highly
+counterintuitive effects on ordering.
 
 Worse yet, if the compiler is able to prove (say) that the value of
 variable 'a' is always non-zero, it would be well within its rights
@@ -605,11 +589,12 @@ to optimize the original example by eliminating the "if" statement
 as follows:
 
 	q = a;
-	b = p;  /* BUG: Compiler can reorder!!! */
-	do_something();
+	b = p;  /* BUG: Compiler and CPU can both reorder!!! */
+
+So don't leave out the ACCESS_ONCE().
 
-The solution is again ACCESS_ONCE() and barrier(), which preserves the
-ordering between the load from variable 'a' and the store to variable 'b':
+It is tempting to try to enforce ordering on identical stores on both
+branches of the "if" statement as follows:
 
 	q = ACCESS_ONCE(a);
 	if (q) {
@@ -622,18 +607,11 @@ ordering between the load from variable 'a' and the store to variable 'b':
 		do_something_else();
 	}
 
-The initial ACCESS_ONCE() is required to prevent the compiler from
-proving the value of 'a', and the pair of barrier() invocations are
-required to prevent the compiler from pulling the two identical stores
-to 'b' out from the legs of the "if" statement.
-
-It is important to note that control dependencies absolutely require a
-a conditional.  For example, the following "optimized" version of
-the above example breaks ordering, which is why the barrier() invocations
-are absolutely required if you have identical stores in both legs of
-the "if" statement:
+Unfortunately, current compilers will transform this as follows at high
+optimization levels:
 
 	q = ACCESS_ONCE(a);
+	barrier();
 	ACCESS_ONCE(b) = p;  /* BUG: No ordering vs. load from a!!! */
 	if (q) {
 		/* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */
@@ -643,21 +621,25 @@ the "if" statement:
 		do_something_else();
 	}
 
-It is of course legal for the prior load to be part of the conditional,
-for example, as follows:
+Now there is no conditional between the load from 'a' and the store to
+'b', which means that the CPU is within its rights to reorder them:
+The conditional is absolutely required, and must be present in the
+assembly code even after all compiler optimizations have been applied.
 
-	if (ACCESS_ONCE(a) > 0) {
-		barrier();
-		ACCESS_ONCE(b) = q / 2;
+So two-legged-if control ordering is guaranteed only when the stores
+differ, for example:
+
+	q = ACCESS_ONCE(a);
+	if (q) {
+		ACCESS_ONCE(b) = p;
 		do_something();
 	} else {
-		barrier();
-		ACCESS_ONCE(b) = q / 3;
+		ACCESS_ONCE(b) = r;
 		do_something_else();
 	}
 
-This will again ensure that the load from variable 'a' is ordered before the
-stores to variable 'b'.
+The initial ACCESS_ONCE() is still required to prevent the compiler from
+proving the value of 'a'.
 
 In addition, you need to be careful what you do with the local variable 'q',
 otherwise the compiler might be able to guess the value and again remove
@@ -665,12 +647,10 @@ the needed conditional.  For example:
 
 	q = ACCESS_ONCE(a);
 	if (q % MAX) {
-		barrier();
 		ACCESS_ONCE(b) = p;
 		do_something();
 	} else {
-		barrier();
-		ACCESS_ONCE(b) = p;
+		ACCESS_ONCE(b) = r;
 		do_something_else();
 	}
 
@@ -679,15 +659,15 @@ equal to zero, in which case the compiler is within its rights to
 transform the above code into the following:
 
 	q = ACCESS_ONCE(a);
-	barrier();
 	ACCESS_ONCE(b) = p;
 	do_something_else();
 
-This transformation fails to require that the CPU respect the ordering
-between the load from variable 'a' and the store to variable 'b'.
-Yes, the barrier() is still there, but it affects only the compiler,
-not the CPU.  Therefore, if you are relying on this ordering, you should
-do something like the following:
+Given this transformation, the CPU is not required to respect the ordering
+between the load from variable 'a' and the store to variable 'b'.  It is
+tempting to add a barrier(), but this does not help.  The conditional
+is gone, and the barrier won't bring it back.  Therefore, if you are
+relying on this ordering, you should make sure that MAX is greater than
+one, perhaps as follows:
 
 	q = ACCESS_ONCE(a);
 	BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
@@ -695,10 +675,14 @@ do something like the following:
 		ACCESS_ONCE(b) = p;
 		do_something();
 	} else {
-		ACCESS_ONCE(b) = p;
+		ACCESS_ONCE(b) = r;
 		do_something_else();
 	}
 
+Please note once again that the stores to 'b' differ.  If they were
+identical, as noted earlier, the compiler could pull this store outside
+of the 'if' statement.
+
 Finally, control dependencies do -not- provide transitivity.  This is
 demonstrated by two related examples, with the initial values of
 x and y both being zero:


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Question regarding "Control Dependencies" in memory-barriers.txt
  2014-08-13 22:44         ` Paul E. McKenney
@ 2014-08-14  0:10           ` Pranith Kumar
  2014-08-14  0:35             ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Pranith Kumar @ 2014-08-14  0:10 UTC (permalink / raw)
  To: Paul McKenney; +Cc: Peter Zijlstra, LKML



On Wed, Aug 13, 2014 at 6:44 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

>         .LFB0:
>                 .cfi_startproc
>                 movl    a, %ecx
>                 movl    $1717986919, %edx
>                 movl    %ecx, %eax
>                 imull   %edx
>                 movl    %ecx, %eax
>                 sarl    $31, %eax
>                 movl    %ecx, b
>                 sarl    $2, %edx
>                 subl    %eax, %edx
>                 leal    (%edx,%edx,4), %eax
>                 addl    %eax, %eax
>                 cmpl    %eax, %ecx
>                 jne     .L4
>                 jmp     do_something_else
>                 .p2align 4,,7
>                 .p2align 3
>         .L4:
>                 jmp     do_something
>                 .cfi_endproc
>         .LFE0:
>                 .size   foo, .-foo
>                 .comm   b,4,4
>                 .comm   a,4,4
>                 .ident  "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
>                 .section        .note.GNU-stack,"",@progbits
>
> As you can see, the assignment to "b" has been hoisted above the "if".
> And adding barrier() to each leg of the "if" statement doesn't help,
> the store to "b" still gets hoisted.

That does not match to what I am seeing. I added barrier() to both the
legs and this is the outcome with the same options you used:

0000000000000000 <foo>:
   0:	mov    0x0(%rip),%ecx        # 6 <foo+0x6>
   6:	mov    $0x66666667,%edx
   b:	mov    %ecx,%eax
   d:	imul   %edx
   f:	mov    %ecx,%eax
  11:	sar    $0x1f,%eax
  14:	sar    $0x2,%edx
  17:	sub    %eax,%edx
  19:	lea    (%rdx,%rdx,4),%eax
  1c:	add    %eax,%eax
  1e:	cmp    %eax,%ecx
  20:	jne    30 <foo+0x30>
  22:	mov    %ecx,0x0(%rip)        # 28 <foo+0x28> ACCESS_ONCE(b) = q
  28:	jmpq   2d <foo+0x2d>
  2d:	nopl   (%rax)
  30:	mov    %ecx,0x0(%rip)        # 36 <foo+0x36> ACCESS_ONCE(b) = q
  36:	jmpq   3b <foo+0x3b>

My gcc version is 4.9.1. 

Trying it out with 4.6.4 and 4.7.3 I see what you are seeing! So must be a bug
in older compiler versions.

>
> So this whole "q % MAX" example is bogus...
>
> In fact, if you have the same store on both legs of the "if" statement,
> your ordering appears to be toast, at least at high optimization levels.
>
> You would think that I would have tested with -O3 initially.  :-(

Looks like it is even happening at -O2 with 4.6/4.7 version of gcc. 
Btw, how are you generating the pretty output for assembly? I am using objdump
and it is not as pretty.

> ------------------------------------------------------------------------
>
> memory-barriers: Fix description of 2-legged-if-based control dependencies
>
> Sad to say, current compilers really will hoist identical stores from both
> branches of an "if" statement to precede the conditional.  This commit
> therefore updates the description of control dependencies to reflect this
> ugly reality.
>
> Reported-by: Pranith Kumar <bobby.prani@gmail.com>
> Reported-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Please see one question below:

>
> +Given this transformation, the CPU is not required to respect the ordering
> +between the load from variable 'a' and the store to variable 'b'.  It is
> +tempting to add a barrier(), but this does not help.  The conditional
> +is gone, and the barrier won't bring it back.  Therefore, if you are
> +relying on this ordering, you should make sure that MAX is greater than
> +one, perhaps as follows:
>
>         q = ACCESS_ONCE(a);
>         BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
> @@ -695,10 +675,14 @@ do something like the following:
>                 ACCESS_ONCE(b) = p;
>                 do_something();
>         } else {
> -               ACCESS_ONCE(b) = p;
> +               ACCESS_ONCE(b) = r;
>                 do_something_else();
>         }
>
> +Please note once again that the stores to 'b' differ.  If they were
> +identical, as noted earlier, the compiler could pull this store outside
> +of the 'if' statement.

If the stores to 'b' differ, then there is no issue. Why not document how to
avoid re-ordering in the case where both the stores are the same? In that case
using a stronger barrier like mb() should be sufficient for both the compiler
and the CPU to avoid re-ordering, right?

-- 
Pranith

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question regarding "Control Dependencies" in memory-barriers.txt
  2014-08-14  0:10           ` Pranith Kumar
@ 2014-08-14  0:35             ` Paul E. McKenney
  2014-08-14  1:03               ` Pranith Kumar
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2014-08-14  0:35 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: Peter Zijlstra, LKML

On Wed, Aug 13, 2014 at 08:10:22PM -0400, Pranith Kumar wrote:
> 
> 
> On Wed, Aug 13, 2014 at 6:44 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> >         .LFB0:
> >                 .cfi_startproc
> >                 movl    a, %ecx
> >                 movl    $1717986919, %edx
> >                 movl    %ecx, %eax
> >                 imull   %edx
> >                 movl    %ecx, %eax
> >                 sarl    $31, %eax
> >                 movl    %ecx, b
> >                 sarl    $2, %edx
> >                 subl    %eax, %edx
> >                 leal    (%edx,%edx,4), %eax
> >                 addl    %eax, %eax
> >                 cmpl    %eax, %ecx
> >                 jne     .L4
> >                 jmp     do_something_else
> >                 .p2align 4,,7
> >                 .p2align 3
> >         .L4:
> >                 jmp     do_something
> >                 .cfi_endproc
> >         .LFE0:
> >                 .size   foo, .-foo
> >                 .comm   b,4,4
> >                 .comm   a,4,4
> >                 .ident  "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
> >                 .section        .note.GNU-stack,"",@progbits
> >
> > As you can see, the assignment to "b" has been hoisted above the "if".
> > And adding barrier() to each leg of the "if" statement doesn't help,
> > the store to "b" still gets hoisted.
> 
> That does not match to what I am seeing. I added barrier() to both the
> legs and this is the outcome with the same options you used:
> 
> 0000000000000000 <foo>:
>    0:	mov    0x0(%rip),%ecx        # 6 <foo+0x6>
>    6:	mov    $0x66666667,%edx
>    b:	mov    %ecx,%eax
>    d:	imul   %edx
>    f:	mov    %ecx,%eax
>   11:	sar    $0x1f,%eax
>   14:	sar    $0x2,%edx
>   17:	sub    %eax,%edx
>   19:	lea    (%rdx,%rdx,4),%eax
>   1c:	add    %eax,%eax
>   1e:	cmp    %eax,%ecx
>   20:	jne    30 <foo+0x30>
>   22:	mov    %ecx,0x0(%rip)        # 28 <foo+0x28> ACCESS_ONCE(b) = q
>   28:	jmpq   2d <foo+0x2d>
>   2d:	nopl   (%rax)
>   30:	mov    %ecx,0x0(%rip)        # 36 <foo+0x36> ACCESS_ONCE(b) = q
>   36:	jmpq   3b <foo+0x3b>
> 
> My gcc version is 4.9.1. 
> 
> Trying it out with 4.6.4 and 4.7.3 I see what you are seeing! So must be a bug
> in older compiler versions.

Well, the Linux kernel will eventually drop support for 4.6 and 4.7.
However, it would be good to find out whether or not this difference
in behavior is intentional.

Fortunately, adding an actual instruction to the asm prevents the
hoisting of the store.  In fact, a "nop" suffices.  But yuck...

> > So this whole "q % MAX" example is bogus...
> >
> > In fact, if you have the same store on both legs of the "if" statement,
> > your ordering appears to be toast, at least at high optimization levels.
> >
> > You would think that I would have tested with -O3 initially.  :-(
> 
> Looks like it is even happening at -O2 with 4.6/4.7 version of gcc. 
> Btw, how are you generating the pretty output for assembly? I am using objdump
> and it is not as pretty.

"cc -S" is your friend here.  ;-)

> > ------------------------------------------------------------------------
> >
> > memory-barriers: Fix description of 2-legged-if-based control dependencies
> >
> > Sad to say, current compilers really will hoist identical stores from both
> > branches of an "if" statement to precede the conditional.  This commit
> > therefore updates the description of control dependencies to reflect this
> > ugly reality.
> >
> > Reported-by: Pranith Kumar <bobby.prani@gmail.com>
> > Reported-by: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Please see one question below:
> 
> >
> > +Given this transformation, the CPU is not required to respect the ordering
> > +between the load from variable 'a' and the store to variable 'b'.  It is
> > +tempting to add a barrier(), but this does not help.  The conditional
> > +is gone, and the barrier won't bring it back.  Therefore, if you are
> > +relying on this ordering, you should make sure that MAX is greater than
> > +one, perhaps as follows:
> >
> >         q = ACCESS_ONCE(a);
> >         BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
> > @@ -695,10 +675,14 @@ do something like the following:
> >                 ACCESS_ONCE(b) = p;
> >                 do_something();
> >         } else {
> > -               ACCESS_ONCE(b) = p;
> > +               ACCESS_ONCE(b) = r;
> >                 do_something_else();
> >         }
> >
> > +Please note once again that the stores to 'b' differ.  If they were
> > +identical, as noted earlier, the compiler could pull this store outside
> > +of the 'if' statement.
> 
> If the stores to 'b' differ, then there is no issue. Why not document how to
> avoid re-ordering in the case where both the stores are the same? In that case
> using a stronger barrier like mb() should be sufficient for both the compiler
> and the CPU to avoid re-ordering, right?

Like this?  (On top of the earlier patch.)

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 6062f175abc6..22a969cdd476 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -625,9 +625,20 @@ Now there is no conditional between the load from 'a' and the store to
 'b', which means that the CPU is within its rights to reorder them:
 The conditional is absolutely required, and must be present in the
 assembly code even after all compiler optimizations have been applied.
+Therefore, if you need ordering in this example, you need explicit
+memory barriers, for example, smp_store_release():
 
-So two-legged-if control ordering is guaranteed only when the stores
-differ, for example:
+	q = ACCESS_ONCE(a);
+	if (q) {
+		smp_store_release(&b, p);
+		do_something();
+	} else {
+		smp_store_release(&b, p);
+		do_something_else();
+	}
+
+In contrast, without explicit memory barriers, two-legged-if control
+ordering is guaranteed only when the stores differ, for example:
 
 	q = ACCESS_ONCE(a);
 	if (q) {


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Question regarding "Control Dependencies" in memory-barriers.txt
  2014-08-14  0:35             ` Paul E. McKenney
@ 2014-08-14  1:03               ` Pranith Kumar
  0 siblings, 0 replies; 10+ messages in thread
From: Pranith Kumar @ 2014-08-14  1:03 UTC (permalink / raw)
  To: Paul McKenney; +Cc: Peter Zijlstra, LKML

On Wed, Aug 13, 2014 at 8:35 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>>
>> If the stores to 'b' differ, then there is no issue. Why not document how to
>> avoid re-ordering in the case where both the stores are the same? In that case
>> using a stronger barrier like mb() should be sufficient for both the compiler
>> and the CPU to avoid re-ordering, right?
>
> Like this?  (On top of the earlier patch.)
>

That looks good, thank you!


>                                                         Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 6062f175abc6..22a969cdd476 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -625,9 +625,20 @@ Now there is no conditional between the load from 'a' and the store to
>  'b', which means that the CPU is within its rights to reorder them:
>  The conditional is absolutely required, and must be present in the
>  assembly code even after all compiler optimizations have been applied.
> +Therefore, if you need ordering in this example, you need explicit
> +memory barriers, for example, smp_store_release():
>
> -So two-legged-if control ordering is guaranteed only when the stores
> -differ, for example:
> +       q = ACCESS_ONCE(a);
> +       if (q) {
> +               smp_store_release(&b, p);
> +               do_something();
> +       } else {
> +               smp_store_release(&b, p);
> +               do_something_else();
> +       }
> +
> +In contrast, without explicit memory barriers, two-legged-if control
> +ordering is guaranteed only when the stores differ, for example:
>
>         q = ACCESS_ONCE(a);
>         if (q) {
>

-- 
Pranith

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-08-14  1:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-04 17:07 Question regarding "Control Dependencies" in memory-barriers.txt Pranith Kumar
2014-08-04 18:52 ` Paul E. McKenney
2014-08-04 21:03   ` Pranith Kumar
2014-08-05  7:32     ` Peter Zijlstra
2014-08-05 12:13       ` Pranith Kumar
2014-08-05 12:58         ` Peter Zijlstra
2014-08-13 22:44         ` Paul E. McKenney
2014-08-14  0:10           ` Pranith Kumar
2014-08-14  0:35             ` Paul E. McKenney
2014-08-14  1:03               ` Pranith Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.