All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] advsync: Fix store-buffering sequence table
@ 2017-07-04 15:23 Akira Yokosawa
  2017-07-04 22:21 ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Akira Yokosawa @ 2017-07-04 15:23 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

From 2845eb208a6e63493997de47293a47ef774a9d49 Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Tue, 4 Jul 2017 23:18:30 +0900
Subject: [PATCH] advsync: Fix store-buffering sequence table

Row 6 of the table added in commit 2d5bf8d25a71 ("advsync: Add
memory-barriered store-buffering example") needs some context
adjustment.

Also tweak horizontal spacing of wide tables for one-column layout.
Also add a few words to the footnote giving definition of
__atomic_thread_fence().

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 advsync/memorybarriers.tex | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/advsync/memorybarriers.tex b/advsync/memorybarriers.tex
index 4ae3ca8..f26a7c5 100644
--- a/advsync/memorybarriers.tex
+++ b/advsync/memorybarriers.tex
@@ -174,7 +174,7 @@ Figure~\ref{fig:advsync:Memory Misordering: Store-Buffering Litmus Test}.

 \begin{table*}
 \small
-\centering
+\centering\OneColumnHSpace{-.1in}
 \begin{tabular}{r||l|l|l||l|l|l}
 	& \multicolumn{3}{c||}{CPU 0} & \multicolumn{3}{c}{CPU 1} \\
 	\cline{2-7}
@@ -318,6 +318,8 @@ ordering and memory barriers work, read on!
 The first stop is
 Figure~\ref{fig:advsync:Memory Ordering: Store-Buffering Litmus Test},
 which has \co{__atomic_thread_fence()} directives\footnote{
+	One of GCC's atomic intrinsics briefly introduced in
+	Section~\ref{sec:toolsoftrade:Atomic Operations (C11)}.
 	Similar to the Linux kernel's \co{smp_mb()} full memory barrier.}
 placed between
 the store and load in both \co{P0()} and \co{P1()}, but is otherwise
@@ -339,7 +341,7 @@ Figure~\ref{fig:advsync:Memory Misordering: Store-Buffering Litmus Test}.

 \begin{table*}
 \small
-\centering
+\centering\OneColumnHSpace{-0.75in}
 \begin{tabular}{r||l|l|l||l|l|l}
 	& \multicolumn{3}{c||}{CPU 0} & \multicolumn{3}{c}{CPU 1} \\
 	\cline{2-7}
@@ -362,8 +364,8 @@ Figure~\ref{fig:advsync:Memory Misordering: Store-Buffering Litmus Test}.
 	5 & (Finish store) & & \tco{x0==2} &
 		(Finish store) & & \tco{x1==2} \\
 	\hline
-	6 & \tco{r2 = *x1;} (2) & \tco{x0==2} & \tco{x1==0} &
-		\tco{r2 = *x0;} (2) & \tco{x1==2} & \tco{x0==0} \\
+	6 & \tco{r2 = *x1;} (2) & & \tco{x1==2} &
+		\tco{r2 = *x0;} (2) & & \tco{x0==2} \\
 \end{tabular}
 \caption{Memory Ordering: Store-Buffering Sequence of Events}
 \label{tab:advsync:Memory Ordering: Store-Buffering Sequence of Events}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] advsync: Fix store-buffering sequence table
  2017-07-04 15:23 [PATCH] advsync: Fix store-buffering sequence table Akira Yokosawa
@ 2017-07-04 22:21 ` Paul E. McKenney
  2017-07-05 14:22   ` Akira Yokosawa
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2017-07-04 22:21 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Wed, Jul 05, 2017 at 12:23:09AM +0900, Akira Yokosawa wrote:
> >From 2845eb208a6e63493997de47293a47ef774a9d49 Mon Sep 17 00:00:00 2001
> From: Akira Yokosawa <akiyks@gmail.com>
> Date: Tue, 4 Jul 2017 23:18:30 +0900
> Subject: [PATCH] advsync: Fix store-buffering sequence table
> 
> Row 6 of the table added in commit 2d5bf8d25a71 ("advsync: Add
> memory-barriered store-buffering example") needs some context
> adjustment.
> 
> Also tweak horizontal spacing of wide tables for one-column layout.
> Also add a few words to the footnote giving definition of
> __atomic_thread_fence().
> 
> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

Good catches!  Queued and pushed.  I reworded the footnote a bit, so
please let me know if I overdid it.

							Thanx, Paul

> ---
>  advsync/memorybarriers.tex | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/advsync/memorybarriers.tex b/advsync/memorybarriers.tex
> index 4ae3ca8..f26a7c5 100644
> --- a/advsync/memorybarriers.tex
> +++ b/advsync/memorybarriers.tex
> @@ -174,7 +174,7 @@ Figure~\ref{fig:advsync:Memory Misordering: Store-Buffering Litmus Test}.
> 
>  \begin{table*}
>  \small
> -\centering
> +\centering\OneColumnHSpace{-.1in}
>  \begin{tabular}{r||l|l|l||l|l|l}
>  	& \multicolumn{3}{c||}{CPU 0} & \multicolumn{3}{c}{CPU 1} \\
>  	\cline{2-7}
> @@ -318,6 +318,8 @@ ordering and memory barriers work, read on!
>  The first stop is
>  Figure~\ref{fig:advsync:Memory Ordering: Store-Buffering Litmus Test},
>  which has \co{__atomic_thread_fence()} directives\footnote{
> +	One of GCC's atomic intrinsics briefly introduced in
> +	Section~\ref{sec:toolsoftrade:Atomic Operations (C11)}.
>  	Similar to the Linux kernel's \co{smp_mb()} full memory barrier.}
>  placed between
>  the store and load in both \co{P0()} and \co{P1()}, but is otherwise
> @@ -339,7 +341,7 @@ Figure~\ref{fig:advsync:Memory Misordering: Store-Buffering Litmus Test}.
> 
>  \begin{table*}
>  \small
> -\centering
> +\centering\OneColumnHSpace{-0.75in}
>  \begin{tabular}{r||l|l|l||l|l|l}
>  	& \multicolumn{3}{c||}{CPU 0} & \multicolumn{3}{c}{CPU 1} \\
>  	\cline{2-7}
> @@ -362,8 +364,8 @@ Figure~\ref{fig:advsync:Memory Misordering: Store-Buffering Litmus Test}.
>  	5 & (Finish store) & & \tco{x0==2} &
>  		(Finish store) & & \tco{x1==2} \\
>  	\hline
> -	6 & \tco{r2 = *x1;} (2) & \tco{x0==2} & \tco{x1==0} &
> -		\tco{r2 = *x0;} (2) & \tco{x1==2} & \tco{x0==0} \\
> +	6 & \tco{r2 = *x1;} (2) & & \tco{x1==2} &
> +		\tco{r2 = *x0;} (2) & & \tco{x0==2} \\
>  \end{tabular}
>  \caption{Memory Ordering: Store-Buffering Sequence of Events}
>  \label{tab:advsync:Memory Ordering: Store-Buffering Sequence of Events}
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] advsync: Fix store-buffering sequence table
  2017-07-04 22:21 ` Paul E. McKenney
@ 2017-07-05 14:22   ` Akira Yokosawa
  2017-07-05 15:40     ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Akira Yokosawa @ 2017-07-05 14:22 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

On 2017/07/04 15:21:38 -0700, Paul E. McKenney wrote:
> On Wed, Jul 05, 2017 at 12:23:09AM +0900, Akira Yokosawa wrote:
>> >From 2845eb208a6e63493997de47293a47ef774a9d49 Mon Sep 17 00:00:00 2001
>> From: Akira Yokosawa <akiyks@gmail.com>
>> Date: Tue, 4 Jul 2017 23:18:30 +0900
>> Subject: [PATCH] advsync: Fix store-buffering sequence table
>>
>> Row 6 of the table added in commit 2d5bf8d25a71 ("advsync: Add
>> memory-barriered store-buffering example") needs some context
>> adjustment.
>>
>> Also tweak horizontal spacing of wide tables for one-column layout.
>> Also add a few words to the footnote giving definition of
>> __atomic_thread_fence().
>>
>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
> 
> Good catches!  Queued and pushed.  I reworded the footnote a bit, so
> please let me know if I overdid it.

After your changes in commit 036372ac2573 ("advsync: Use gcc's C11-like
intrinsics to avoid data races"), this footnote seems verbose, doesn't it?

But, I'm not so much a fan of the changes of your commit.
It becomes hard to see the relation of lines in litmus tests and rows
in the tables. Also, those intrinsics have fairly large overheads.

How about using "volatile" in thread arg declaration such as the following?

---
C C-SB+o-o+o-o
{
}

P0(volatile int *x0, volatile int *x1)
{
	int r2;

	*x0 = 2;
	r2 = *x1;
}


P1(volatile int *x0, volatile int *x1)
{
	int r2;

	*x1 = 2;
	r2 = *x0;
}

exists (1:r2=0 /\ 0:r2=0)
---

If all you need is to prevent memory accesses from being optimized away,
they should suffice. But they might be unpopular among kernel community.

I checked the generated C code in cross-compiling mode of litmus7, and
the volatile-ness is reflected there.

Thoughts?

          Thanks, Akira

> 
> 							Thanx, Paul
> 
>> ---
>>  advsync/memorybarriers.tex | 10 ++++++----
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/advsync/memorybarriers.tex b/advsync/memorybarriers.tex
>> index 4ae3ca8..f26a7c5 100644
>> --- a/advsync/memorybarriers.tex
>> +++ b/advsync/memorybarriers.tex
>> @@ -174,7 +174,7 @@ Figure~\ref{fig:advsync:Memory Misordering: Store-Buffering Litmus Test}.
>>
>>  \begin{table*}
>>  \small
>> -\centering
>> +\centering\OneColumnHSpace{-.1in}
>>  \begin{tabular}{r||l|l|l||l|l|l}
>>  	& \multicolumn{3}{c||}{CPU 0} & \multicolumn{3}{c}{CPU 1} \\
>>  	\cline{2-7}
>> @@ -318,6 +318,8 @@ ordering and memory barriers work, read on!
>>  The first stop is
>>  Figure~\ref{fig:advsync:Memory Ordering: Store-Buffering Litmus Test},
>>  which has \co{__atomic_thread_fence()} directives\footnote{
>> +	One of GCC's atomic intrinsics briefly introduced in
>> +	Section~\ref{sec:toolsoftrade:Atomic Operations (C11)}.
>>  	Similar to the Linux kernel's \co{smp_mb()} full memory barrier.}
>>  placed between
>>  the store and load in both \co{P0()} and \co{P1()}, but is otherwise
>> @@ -339,7 +341,7 @@ Figure~\ref{fig:advsync:Memory Misordering: Store-Buffering Litmus Test}.
>>
>>  \begin{table*}
>>  \small
>> -\centering
>> +\centering\OneColumnHSpace{-0.75in}
>>  \begin{tabular}{r||l|l|l||l|l|l}
>>  	& \multicolumn{3}{c||}{CPU 0} & \multicolumn{3}{c}{CPU 1} \\
>>  	\cline{2-7}
>> @@ -362,8 +364,8 @@ Figure~\ref{fig:advsync:Memory Misordering: Store-Buffering Litmus Test}.
>>  	5 & (Finish store) & & \tco{x0==2} &
>>  		(Finish store) & & \tco{x1==2} \\
>>  	\hline
>> -	6 & \tco{r2 = *x1;} (2) & \tco{x0==2} & \tco{x1==0} &
>> -		\tco{r2 = *x0;} (2) & \tco{x1==2} & \tco{x0==0} \\
>> +	6 & \tco{r2 = *x1;} (2) & & \tco{x1==2} &
>> +		\tco{r2 = *x0;} (2) & & \tco{x0==2} \\
>>  \end{tabular}
>>  \caption{Memory Ordering: Store-Buffering Sequence of Events}
>>  \label{tab:advsync:Memory Ordering: Store-Buffering Sequence of Events}
>> -- 
>> 2.7.4
>>
> 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] advsync: Fix store-buffering sequence table
  2017-07-05 14:22   ` Akira Yokosawa
@ 2017-07-05 15:40     ` Paul E. McKenney
  2017-07-05 15:50       ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2017-07-05 15:40 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Wed, Jul 05, 2017 at 11:22:52PM +0900, Akira Yokosawa wrote:
> On 2017/07/04 15:21:38 -0700, Paul E. McKenney wrote:
> > On Wed, Jul 05, 2017 at 12:23:09AM +0900, Akira Yokosawa wrote:
> >> >From 2845eb208a6e63493997de47293a47ef774a9d49 Mon Sep 17 00:00:00 2001
> >> From: Akira Yokosawa <akiyks@gmail.com>
> >> Date: Tue, 4 Jul 2017 23:18:30 +0900
> >> Subject: [PATCH] advsync: Fix store-buffering sequence table
> >>
> >> Row 6 of the table added in commit 2d5bf8d25a71 ("advsync: Add
> >> memory-barriered store-buffering example") needs some context
> >> adjustment.
> >>
> >> Also tweak horizontal spacing of wide tables for one-column layout.
> >> Also add a few words to the footnote giving definition of
> >> __atomic_thread_fence().
> >>
> >> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
> > 
> > Good catches!  Queued and pushed.  I reworded the footnote a bit, so
> > please let me know if I overdid it.
> 
> After your changes in commit 036372ac2573 ("advsync: Use gcc's C11-like
> intrinsics to avoid data races"), this footnote seems verbose, doesn't it?
> 
> But, I'm not so much a fan of the changes of your commit.
> It becomes hard to see the relation of lines in litmus tests and rows
> in the tables. Also, those intrinsics have fairly large overheads.

They certainly are ugly, no two ways about that!  ;-)

> How about using "volatile" in thread arg declaration such as the following?
> 
> ---
> C C-SB+o-o+o-o
> {
> }
> 
> P0(volatile int *x0, volatile int *x1)
> {
> 	int r2;
> 
> 	*x0 = 2;
> 	r2 = *x1;
> }
> 
> 
> P1(volatile int *x0, volatile int *x1)
> {
> 	int r2;
> 
> 	*x1 = 2;
> 	r2 = *x0;
> }
> 
> exists (1:r2=0 /\ 0:r2=0)
> ---
> 
> If all you need is to prevent memory accesses from being optimized away,
> they should suffice. But they might be unpopular among kernel community.

To say nothing of their unpopularity among the C11 and C++11
communities!

> I checked the generated C code in cross-compiling mode of litmus7, and
> the volatile-ness is reflected there.

And it also works just fine without the volatile -- the litmus7 tool
does the translation so as to avoid destructive compiler optimizations.

I am checking with the litmus7 people to see if there is any way to
map identifiers.  Some of the other tools support a "-macros"
command-line argument, which would allow mapping from "smp_mb()" to
"__atomic_thread_fence(__ATOMIC_SEQ_CST)", but not litmus7.

So I cannot go with "volatile", but let's see if I can do something
better than the gcc intrinsics.

							Thanx, Paul

> Thoughts?
> 
>           Thanks, Akira
> 
> > 
> > 							Thanx, Paul
> > 
> >> ---
> >>  advsync/memorybarriers.tex | 10 ++++++----
> >>  1 file changed, 6 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/advsync/memorybarriers.tex b/advsync/memorybarriers.tex
> >> index 4ae3ca8..f26a7c5 100644
> >> --- a/advsync/memorybarriers.tex
> >> +++ b/advsync/memorybarriers.tex
> >> @@ -174,7 +174,7 @@ Figure~\ref{fig:advsync:Memory Misordering: Store-Buffering Litmus Test}.
> >>
> >>  \begin{table*}
> >>  \small
> >> -\centering
> >> +\centering\OneColumnHSpace{-.1in}
> >>  \begin{tabular}{r||l|l|l||l|l|l}
> >>  	& \multicolumn{3}{c||}{CPU 0} & \multicolumn{3}{c}{CPU 1} \\
> >>  	\cline{2-7}
> >> @@ -318,6 +318,8 @@ ordering and memory barriers work, read on!
> >>  The first stop is
> >>  Figure~\ref{fig:advsync:Memory Ordering: Store-Buffering Litmus Test},
> >>  which has \co{__atomic_thread_fence()} directives\footnote{
> >> +	One of GCC's atomic intrinsics briefly introduced in
> >> +	Section~\ref{sec:toolsoftrade:Atomic Operations (C11)}.
> >>  	Similar to the Linux kernel's \co{smp_mb()} full memory barrier.}
> >>  placed between
> >>  the store and load in both \co{P0()} and \co{P1()}, but is otherwise
> >> @@ -339,7 +341,7 @@ Figure~\ref{fig:advsync:Memory Misordering: Store-Buffering Litmus Test}.
> >>
> >>  \begin{table*}
> >>  \small
> >> -\centering
> >> +\centering\OneColumnHSpace{-0.75in}
> >>  \begin{tabular}{r||l|l|l||l|l|l}
> >>  	& \multicolumn{3}{c||}{CPU 0} & \multicolumn{3}{c}{CPU 1} \\
> >>  	\cline{2-7}
> >> @@ -362,8 +364,8 @@ Figure~\ref{fig:advsync:Memory Misordering: Store-Buffering Litmus Test}.
> >>  	5 & (Finish store) & & \tco{x0==2} &
> >>  		(Finish store) & & \tco{x1==2} \\
> >>  	\hline
> >> -	6 & \tco{r2 = *x1;} (2) & \tco{x0==2} & \tco{x1==0} &
> >> -		\tco{r2 = *x0;} (2) & \tco{x1==2} & \tco{x0==0} \\
> >> +	6 & \tco{r2 = *x1;} (2) & & \tco{x1==2} &
> >> +		\tco{r2 = *x0;} (2) & & \tco{x0==2} \\
> >>  \end{tabular}
> >>  \caption{Memory Ordering: Store-Buffering Sequence of Events}
> >>  \label{tab:advsync:Memory Ordering: Store-Buffering Sequence of Events}
> >> -- 
> >> 2.7.4
> >>
> > 
> > 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] advsync: Fix store-buffering sequence table
  2017-07-05 15:40     ` Paul E. McKenney
@ 2017-07-05 15:50       ` Paul E. McKenney
  2017-07-05 17:32         ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2017-07-05 15:50 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Wed, Jul 05, 2017 at 08:40:24AM -0700, Paul E. McKenney wrote:
> On Wed, Jul 05, 2017 at 11:22:52PM +0900, Akira Yokosawa wrote:
> > On 2017/07/04 15:21:38 -0700, Paul E. McKenney wrote:
> > > On Wed, Jul 05, 2017 at 12:23:09AM +0900, Akira Yokosawa wrote:
> > >> >From 2845eb208a6e63493997de47293a47ef774a9d49 Mon Sep 17 00:00:00 2001
> > >> From: Akira Yokosawa <akiyks@gmail.com>
> > >> Date: Tue, 4 Jul 2017 23:18:30 +0900
> > >> Subject: [PATCH] advsync: Fix store-buffering sequence table
> > >>
> > >> Row 6 of the table added in commit 2d5bf8d25a71 ("advsync: Add
> > >> memory-barriered store-buffering example") needs some context
> > >> adjustment.
> > >>
> > >> Also tweak horizontal spacing of wide tables for one-column layout.
> > >> Also add a few words to the footnote giving definition of
> > >> __atomic_thread_fence().
> > >>
> > >> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
> > > 
> > > Good catches!  Queued and pushed.  I reworded the footnote a bit, so
> > > please let me know if I overdid it.
> > 
> > After your changes in commit 036372ac2573 ("advsync: Use gcc's C11-like
> > intrinsics to avoid data races"), this footnote seems verbose, doesn't it?
> > 
> > But, I'm not so much a fan of the changes of your commit.
> > It becomes hard to see the relation of lines in litmus tests and rows
> > in the tables. Also, those intrinsics have fairly large overheads.
> 
> They certainly are ugly, no two ways about that!  ;-)
> 
> > How about using "volatile" in thread arg declaration such as the following?
> > 
> > ---
> > C C-SB+o-o+o-o
> > {
> > }
> > 
> > P0(volatile int *x0, volatile int *x1)
> > {
> > 	int r2;
> > 
> > 	*x0 = 2;
> > 	r2 = *x1;
> > }
> > 
> > 
> > P1(volatile int *x0, volatile int *x1)
> > {
> > 	int r2;
> > 
> > 	*x1 = 2;
> > 	r2 = *x0;
> > }
> > 
> > exists (1:r2=0 /\ 0:r2=0)
> > ---
> > 
> > If all you need is to prevent memory accesses from being optimized away,
> > they should suffice. But they might be unpopular among kernel community.
> 
> To say nothing of their unpopularity among the C11 and C++11
> communities!
> 
> > I checked the generated C code in cross-compiling mode of litmus7, and
> > the volatile-ness is reflected there.
> 
> And it also works just fine without the volatile -- the litmus7 tool
> does the translation so as to avoid destructive compiler optimizations.
> 
> I am checking with the litmus7 people to see if there is any way to
> map identifiers.  Some of the other tools support a "-macros"
> command-line argument, which would allow mapping from "smp_mb()" to
> "__atomic_thread_fence(__ATOMIC_SEQ_CST)", but not litmus7.
> 
> So I cannot go with "volatile", but let's see if I can do something
> better than the gcc intrinsics.

And if the litmus7 guys don't have anything for me, I can always write a
script to do the conversions.  So either way, we will have kernel-style
notation at some point.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] advsync: Fix store-buffering sequence table
  2017-07-05 15:50       ` Paul E. McKenney
@ 2017-07-05 17:32         ` Paul E. McKenney
  2017-07-05 22:15           ` Akira Yokosawa
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2017-07-05 17:32 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Wed, Jul 05, 2017 at 08:50:04AM -0700, Paul E. McKenney wrote:
> On Wed, Jul 05, 2017 at 08:40:24AM -0700, Paul E. McKenney wrote:
> > On Wed, Jul 05, 2017 at 11:22:52PM +0900, Akira Yokosawa wrote:
> > > On 2017/07/04 15:21:38 -0700, Paul E. McKenney wrote:
> > > > On Wed, Jul 05, 2017 at 12:23:09AM +0900, Akira Yokosawa wrote:
> > > >> >From 2845eb208a6e63493997de47293a47ef774a9d49 Mon Sep 17 00:00:00 2001
> > > >> From: Akira Yokosawa <akiyks@gmail.com>
> > > >> Date: Tue, 4 Jul 2017 23:18:30 +0900
> > > >> Subject: [PATCH] advsync: Fix store-buffering sequence table
> > > >>
> > > >> Row 6 of the table added in commit 2d5bf8d25a71 ("advsync: Add
> > > >> memory-barriered store-buffering example") needs some context
> > > >> adjustment.
> > > >>
> > > >> Also tweak horizontal spacing of wide tables for one-column layout.
> > > >> Also add a few words to the footnote giving definition of
> > > >> __atomic_thread_fence().
> > > >>
> > > >> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
> > > > 
> > > > Good catches!  Queued and pushed.  I reworded the footnote a bit, so
> > > > please let me know if I overdid it.
> > > 
> > > After your changes in commit 036372ac2573 ("advsync: Use gcc's C11-like
> > > intrinsics to avoid data races"), this footnote seems verbose, doesn't it?
> > > 
> > > But, I'm not so much a fan of the changes of your commit.
> > > It becomes hard to see the relation of lines in litmus tests and rows
> > > in the tables. Also, those intrinsics have fairly large overheads.
> > 
> > They certainly are ugly, no two ways about that!  ;-)
> > 
> > > How about using "volatile" in thread arg declaration such as the following?
> > > 
> > > ---
> > > C C-SB+o-o+o-o
> > > {
> > > }
> > > 
> > > P0(volatile int *x0, volatile int *x1)
> > > {
> > > 	int r2;
> > > 
> > > 	*x0 = 2;
> > > 	r2 = *x1;
> > > }
> > > 
> > > 
> > > P1(volatile int *x0, volatile int *x1)
> > > {
> > > 	int r2;
> > > 
> > > 	*x1 = 2;
> > > 	r2 = *x0;
> > > }
> > > 
> > > exists (1:r2=0 /\ 0:r2=0)
> > > ---
> > > 
> > > If all you need is to prevent memory accesses from being optimized away,
> > > they should suffice. But they might be unpopular among kernel community.
> > 
> > To say nothing of their unpopularity among the C11 and C++11
> > communities!
> > 
> > > I checked the generated C code in cross-compiling mode of litmus7, and
> > > the volatile-ness is reflected there.
> > 
> > And it also works just fine without the volatile -- the litmus7 tool
> > does the translation so as to avoid destructive compiler optimizations.
> > 
> > I am checking with the litmus7 people to see if there is any way to
> > map identifiers.  Some of the other tools support a "-macros"
> > command-line argument, which would allow mapping from "smp_mb()" to
> > "__atomic_thread_fence(__ATOMIC_SEQ_CST)", but not litmus7.
> > 
> > So I cannot go with "volatile", but let's see if I can do something
> > better than the gcc intrinsics.
> 
> And if the litmus7 guys don't have anything for me, I can always write a
> script to do the conversions.  So either way, we will have kernel-style
> notation at some point.  ;-)

And it turns out that the contents of a second "{}" in a litmus7 test
is pulled directly into the output code, so an easy change.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] advsync: Fix store-buffering sequence table
  2017-07-05 17:32         ` Paul E. McKenney
@ 2017-07-05 22:15           ` Akira Yokosawa
  2017-07-05 22:32             ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Akira Yokosawa @ 2017-07-05 22:15 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

On 2017/07/05 10:32:49 -0700, Paul E. McKenney wrote:
> On Wed, Jul 05, 2017 at 08:50:04AM -0700, Paul E. McKenney wrote:
>> On Wed, Jul 05, 2017 at 08:40:24AM -0700, Paul E. McKenney wrote:
>>> On Wed, Jul 05, 2017 at 11:22:52PM +0900, Akira Yokosawa wrote:
>>>> On 2017/07/04 15:21:38 -0700, Paul E. McKenney wrote:
>>>>> On Wed, Jul 05, 2017 at 12:23:09AM +0900, Akira Yokosawa wrote:
>>>>>> >From 2845eb208a6e63493997de47293a47ef774a9d49 Mon Sep 17 00:00:00 2001
>>>>>> From: Akira Yokosawa <akiyks@gmail.com>
>>>>>> Date: Tue, 4 Jul 2017 23:18:30 +0900
>>>>>> Subject: [PATCH] advsync: Fix store-buffering sequence table
>>>>>>
>>>>>> Row 6 of the table added in commit 2d5bf8d25a71 ("advsync: Add
>>>>>> memory-barriered store-buffering example") needs some context
>>>>>> adjustment.
>>>>>>
>>>>>> Also tweak horizontal spacing of wide tables for one-column layout.
>>>>>> Also add a few words to the footnote giving definition of
>>>>>> __atomic_thread_fence().
>>>>>>
>>>>>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
>>>>>
>>>>> Good catches!  Queued and pushed.  I reworded the footnote a bit, so
>>>>> please let me know if I overdid it.
>>>>
>>>> After your changes in commit 036372ac2573 ("advsync: Use gcc's C11-like
>>>> intrinsics to avoid data races"), this footnote seems verbose, doesn't it?
>>>>
>>>> But, I'm not so much a fan of the changes of your commit.
>>>> It becomes hard to see the relation of lines in litmus tests and rows
>>>> in the tables. Also, those intrinsics have fairly large overheads.
>>>
>>> They certainly are ugly, no two ways about that!  ;-)
>>>
>>>> How about using "volatile" in thread arg declaration such as the following?
>>>>
>>>> ---
>>>> C C-SB+o-o+o-o
>>>> {
>>>> }
>>>>
>>>> P0(volatile int *x0, volatile int *x1)
>>>> {
>>>> 	int r2;
>>>>
>>>> 	*x0 = 2;
>>>> 	r2 = *x1;
>>>> }
>>>>
>>>>
>>>> P1(volatile int *x0, volatile int *x1)
>>>> {
>>>> 	int r2;
>>>>
>>>> 	*x1 = 2;
>>>> 	r2 = *x0;
>>>> }
>>>>
>>>> exists (1:r2=0 /\ 0:r2=0)
>>>> ---
>>>>
>>>> If all you need is to prevent memory accesses from being optimized away,
>>>> they should suffice. But they might be unpopular among kernel community.
>>>
>>> To say nothing of their unpopularity among the C11 and C++11
>>> communities!

Ah, I kind of anticipated it...

>>>
>>>> I checked the generated C code in cross-compiling mode of litmus7, and
>>>> the volatile-ness is reflected there.
>>>
>>> And it also works just fine without the volatile -- the litmus7 tool
>>> does the translation so as to avoid destructive compiler optimizations.
>>>
>>> I am checking with the litmus7 people to see if there is any way to
>>> map identifiers.  Some of the other tools support a "-macros"
>>> command-line argument, which would allow mapping from "smp_mb()" to
>>> "__atomic_thread_fence(__ATOMIC_SEQ_CST)", but not litmus7.
>>>
>>> So I cannot go with "volatile", but let's see if I can do something
>>> better than the gcc intrinsics.
>>
>> And if the litmus7 guys don't have anything for me, I can always write a
>> script to do the conversions.  So either way, we will have kernel-style
>> notation at some point.  ;-)
> 
> And it turns out that the contents of a second "{}" in a litmus7 test
> is pulled directly into the output code, so an easy change.  ;-)

Good news!

So __atomic_thread_fence() will also be replaced with smp_mb() in
the litmus tests? That will reduce the width of the tables and eliminate
the need for the hspece adjustments in one-column layout.

                Thanks, Akira
> 
> 							Thanx, Paul
> 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] advsync: Fix store-buffering sequence table
  2017-07-05 22:15           ` Akira Yokosawa
@ 2017-07-05 22:32             ` Paul E. McKenney
  2017-07-05 22:40               ` Akira Yokosawa
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2017-07-05 22:32 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Thu, Jul 06, 2017 at 07:15:24AM +0900, Akira Yokosawa wrote:
> On 2017/07/05 10:32:49 -0700, Paul E. McKenney wrote:

[ . . . ]

> >>> So I cannot go with "volatile", but let's see if I can do something
> >>> better than the gcc intrinsics.
> >>
> >> And if the litmus7 guys don't have anything for me, I can always write a
> >> script to do the conversions.  So either way, we will have kernel-style
> >> notation at some point.  ;-)
> > 
> > And it turns out that the contents of a second "{}" in a litmus7 test
> > is pulled directly into the output code, so an easy change.  ;-)
> 
> Good news!
> 
> So __atomic_thread_fence() will also be replaced with smp_mb() in
> the litmus tests? That will reduce the width of the tables and eliminate
> the need for the hspece adjustments in one-column layout.

Oops, I forgot to push!  Done now.

And yes, there is now smp_mb() in the litmus tests and the tables.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] advsync: Fix store-buffering sequence table
  2017-07-05 22:32             ` Paul E. McKenney
@ 2017-07-05 22:40               ` Akira Yokosawa
  2017-07-06  0:46                 ` Paul E. McKenney
  2017-07-06 12:09                 ` Akira Yokosawa
  0 siblings, 2 replies; 13+ messages in thread
From: Akira Yokosawa @ 2017-07-05 22:40 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook, Akira Yokosawa

On 2017/07/06 7:32, Paul E. McKenney wrote:
> On Thu, Jul 06, 2017 at 07:15:24AM +0900, Akira Yokosawa wrote:
>> On 2017/07/05 10:32:49 -0700, Paul E. McKenney wrote:
> 
> [ . . . ]
> 
>>>>> So I cannot go with "volatile", but let's see if I can do something
>>>>> better than the gcc intrinsics.
>>>>
>>>> And if the litmus7 guys don't have anything for me, I can always write a
>>>> script to do the conversions.  So either way, we will have kernel-style
>>>> notation at some point.  ;-)
>>>
>>> And it turns out that the contents of a second "{}" in a litmus7 test
>>> is pulled directly into the output code, so an easy change.  ;-)
>>
>> Good news!
>>
>> So __atomic_thread_fence() will also be replaced with smp_mb() in
>> the litmus tests? That will reduce the width of the tables and eliminate
>> the need for the hspece adjustments in one-column layout.
> 
> Oops, I forgot to push!  Done now.
> 
> And yes, there is now smp_mb() in the litmus tests and the tables.

I'll take a look later.
As I mentioned earlier, __atomic_load_n() and __atomic_store_n() seem
to have fairly large overheads (on x86). You might want to see the changes
in the resulting statistics and reflect them in the text.

          Thanks, Akira

> 
> 							Thanx, Paul
> 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] advsync: Fix store-buffering sequence table
  2017-07-05 22:40               ` Akira Yokosawa
@ 2017-07-06  0:46                 ` Paul E. McKenney
  2017-07-06  1:07                   ` Akira Yokosawa
  2017-07-06 12:09                 ` Akira Yokosawa
  1 sibling, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2017-07-06  0:46 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Thu, Jul 06, 2017 at 07:40:45AM +0900, Akira Yokosawa wrote:
> On 2017/07/06 7:32, Paul E. McKenney wrote:
> > On Thu, Jul 06, 2017 at 07:15:24AM +0900, Akira Yokosawa wrote:
> >> On 2017/07/05 10:32:49 -0700, Paul E. McKenney wrote:
> > 
> > [ . . . ]
> > 
> >>>>> So I cannot go with "volatile", but let's see if I can do something
> >>>>> better than the gcc intrinsics.
> >>>>
> >>>> And if the litmus7 guys don't have anything for me, I can always write a
> >>>> script to do the conversions.  So either way, we will have kernel-style
> >>>> notation at some point.  ;-)
> >>>
> >>> And it turns out that the contents of a second "{}" in a litmus7 test
> >>> is pulled directly into the output code, so an easy change.  ;-)
> >>
> >> Good news!
> >>
> >> So __atomic_thread_fence() will also be replaced with smp_mb() in
> >> the litmus tests? That will reduce the width of the tables and eliminate
> >> the need for the hspece adjustments in one-column layout.
> > 
> > Oops, I forgot to push!  Done now.
> > 
> > And yes, there is now smp_mb() in the litmus tests and the tables.
> 
> I'll take a look later.
> As I mentioned earlier, __atomic_load_n() and __atomic_store_n() seem
> to have fairly large overheads (on x86). You might want to see the changes
> in the resulting statistics and reflect them in the text.

I am getting a fair amount of variance on the results, and thus a
fair amount of overlap in the number of times each scenario shows up.
Perhaps I should say that and also state that different hardware will
likely get different results?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] advsync: Fix store-buffering sequence table
  2017-07-06  0:46                 ` Paul E. McKenney
@ 2017-07-06  1:07                   ` Akira Yokosawa
  0 siblings, 0 replies; 13+ messages in thread
From: Akira Yokosawa @ 2017-07-06  1:07 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook, Akira Yokosawa

2017/07/06 9:46、Paul E. McKenney <paulmck@linux.vnet.ibm.com> のメッセージ:

>> On Thu, Jul 06, 2017 at 07:40:45AM +0900, Akira Yokosawa wrote:
>>> On 2017/07/06 7:32, Paul E. McKenney wrote:
>>>> On Thu, Jul 06, 2017 at 07:15:24AM +0900, Akira Yokosawa wrote:
>>>> On 2017/07/05 10:32:49 -0700, Paul E. McKenney wrote:
>>> 
>>> [ . . . ]
>>> 
>>>>>>> So I cannot go with "volatile", but let's see if I can do something
>>>>>>> better than the gcc intrinsics.
>>>>>> 
>>>>>> And if the litmus7 guys don't have anything for me, I can always write a
>>>>>> script to do the conversions.  So either way, we will have kernel-style
>>>>>> notation at some point.  ;-)
>>>>> 
>>>>> And it turns out that the contents of a second "{}" in a litmus7 test
>>>>> is pulled directly into the output code, so an easy change.  ;-)
>>>> 
>>>> Good news!
>>>> 
>>>> So __atomic_thread_fence() will also be replaced with smp_mb() in
>>>> the litmus tests? That will reduce the width of the tables and eliminate
>>>> the need for the hspece adjustments in one-column layout.
>>> 
>>> Oops, I forgot to push!  Done now.
>>> 
>>> And yes, there is now smp_mb() in the litmus tests and the tables.
>> 
>> I'll take a look later.
>> As I mentioned earlier, __atomic_load_n() and __atomic_store_n() seem
>> to have fairly large overheads (on x86). You might want to see the changes
>> in the resulting statistics and reflect them in the text.
> 
> I am getting a fair amount of variance on the results, and thus a
> fair amount of overlap in the number of times each scenario shows up.
> Perhaps I should say that and also state that different hardware will
> likely get different results?
> 
So the reason I saw the large overhead might be the disturbance of virtual guest machine. I didn't try repeatedly after the change.

If you don't see any significant difference in the result, there is no need to modify the text.

But mentioning the possibility of fluctuations will be of help.

Sorry for the noise.

Thanks, Akira 
(from mobile, might be QP encoded)
>                            Thanx, Paul
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] advsync: Fix store-buffering sequence table
  2017-07-05 22:40               ` Akira Yokosawa
  2017-07-06  0:46                 ` Paul E. McKenney
@ 2017-07-06 12:09                 ` Akira Yokosawa
  2017-07-06 18:20                   ` Paul E. McKenney
  1 sibling, 1 reply; 13+ messages in thread
From: Akira Yokosawa @ 2017-07-06 12:09 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

On 2017/07/06 7:40 +0900, Akira Yokosawa wrote:
> On 2017/07/06 7:32, Paul E. McKenney wrote:
>> On Thu, Jul 06, 2017 at 07:15:24AM +0900, Akira Yokosawa wrote:
>>> On 2017/07/05 10:32:49 -0700, Paul E. McKenney wrote:
>>
>> [ . . . ]
>>
>>>>>> So I cannot go with "volatile", but let's see if I can do something
>>>>>> better than the gcc intrinsics.
>>>>>
>>>>> And if the litmus7 guys don't have anything for me, I can always write a
>>>>> script to do the conversions.  So either way, we will have kernel-style
>>>>> notation at some point.  ;-)
>>>>
>>>> And it turns out that the contents of a second "{}" in a litmus7 test
>>>> is pulled directly into the output code, so an easy change.  ;-)
>>>
>>> Good news!
>>>
>>> So __atomic_thread_fence() will also be replaced with smp_mb() in
>>> the litmus tests? That will reduce the width of the tables and eliminate
>>> the need for the hspece adjustments in one-column layout.
>>
>> Oops, I forgot to push!  Done now.
>>
>> And yes, there is now smp_mb() in the litmus tests and the tables.
> 
> I'll take a look later.
> As I mentioned earlier, __atomic_load_n() and __atomic_store_n() seem
> to have fairly large overheads (on x86). You might want to see the changes
> in the resulting statistics and reflect them in the text.

Hi Paul,

The definition of READ_ONCE() and WRITE_ONCE() given in the updated litmus
tests is inconsistent with kernel API. They are accepting pointers such
as x0 and x1.

In the article at LWN (https://lwn.net/Articles/720550/), litmus tests use the
same definition as kernel API. They take the value expression such as *x0 and *x1.

My preference is to accept pointers as arguments as is the way in C functions,
but the kernel community have chosen otherwise. We should be consist here.

So the definition should be:

---
#define READ_ONCE(x) __atomic_load_n(&(x), __ATOMIC_RELAXED)
#define WRITE_ONCE(x, v) __atomic_store_n(&(x), (v), __ATOMIC_RELAXED)
#define smp_mb() __atomic_thread_fence(__ATOMIC_SEQ_CST)
---

Or am I misreading?

                Thanks, akira

> 
>           Thanks, Akira
> 
>>
>> 							Thanx, Paul
>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] advsync: Fix store-buffering sequence table
  2017-07-06 12:09                 ` Akira Yokosawa
@ 2017-07-06 18:20                   ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2017-07-06 18:20 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Thu, Jul 06, 2017 at 09:09:05PM +0900, Akira Yokosawa wrote:
> On 2017/07/06 7:40 +0900, Akira Yokosawa wrote:
> > On 2017/07/06 7:32, Paul E. McKenney wrote:
> >> On Thu, Jul 06, 2017 at 07:15:24AM +0900, Akira Yokosawa wrote:
> >>> On 2017/07/05 10:32:49 -0700, Paul E. McKenney wrote:
> >>
> >> [ . . . ]
> >>
> >>>>>> So I cannot go with "volatile", but let's see if I can do something
> >>>>>> better than the gcc intrinsics.
> >>>>>
> >>>>> And if the litmus7 guys don't have anything for me, I can always write a
> >>>>> script to do the conversions.  So either way, we will have kernel-style
> >>>>> notation at some point.  ;-)
> >>>>
> >>>> And it turns out that the contents of a second "{}" in a litmus7 test
> >>>> is pulled directly into the output code, so an easy change.  ;-)
> >>>
> >>> Good news!
> >>>
> >>> So __atomic_thread_fence() will also be replaced with smp_mb() in
> >>> the litmus tests? That will reduce the width of the tables and eliminate
> >>> the need for the hspece adjustments in one-column layout.
> >>
> >> Oops, I forgot to push!  Done now.
> >>
> >> And yes, there is now smp_mb() in the litmus tests and the tables.
> > 
> > I'll take a look later.
> > As I mentioned earlier, __atomic_load_n() and __atomic_store_n() seem
> > to have fairly large overheads (on x86). You might want to see the changes
> > in the resulting statistics and reflect them in the text.
> 
> Hi Paul,
> 
> The definition of READ_ONCE() and WRITE_ONCE() given in the updated litmus
> tests is inconsistent with kernel API. They are accepting pointers such
> as x0 and x1.
> 
> In the article at LWN (https://lwn.net/Articles/720550/), litmus tests use the
> same definition as kernel API. They take the value expression such as *x0 and *x1.
> 
> My preference is to accept pointers as arguments as is the way in C functions,
> but the kernel community have chosen otherwise. We should be consist here.
> 
> So the definition should be:
> 
> ---
> #define READ_ONCE(x) __atomic_load_n(&(x), __ATOMIC_RELAXED)
> #define WRITE_ONCE(x, v) __atomic_store_n(&(x), (v), __ATOMIC_RELAXED)
> #define smp_mb() __atomic_thread_fence(__ATOMIC_SEQ_CST)
> ---
> 
> Or am I misreading?

No, you are correct!  I added a commit to fix this, and also added
a disclaimer as you suggested in your earlier email.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-07-06 18:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-04 15:23 [PATCH] advsync: Fix store-buffering sequence table Akira Yokosawa
2017-07-04 22:21 ` Paul E. McKenney
2017-07-05 14:22   ` Akira Yokosawa
2017-07-05 15:40     ` Paul E. McKenney
2017-07-05 15:50       ` Paul E. McKenney
2017-07-05 17:32         ` Paul E. McKenney
2017-07-05 22:15           ` Akira Yokosawa
2017-07-05 22:32             ` Paul E. McKenney
2017-07-05 22:40               ` Akira Yokosawa
2017-07-06  0:46                 ` Paul E. McKenney
2017-07-06  1:07                   ` Akira Yokosawa
2017-07-06 12:09                 ` Akira Yokosawa
2017-07-06 18:20                   ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.