All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] count: Employ new scheme for snippet from count_stat.c
@ 2018-10-02 15:15 Akira Yokosawa
  2018-10-02 15:16 ` [PATCH 2/2] count: Adjust type of variable 'counter' with code snippet Akira Yokosawa
  0 siblings, 1 reply; 6+ messages in thread
From: Akira Yokosawa @ 2018-10-02 15:15 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

From 047e2e2aa09633f5605111119e2f45fd43f6febc Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Tue, 2 Oct 2018 23:54:33 +0900
Subject: [PATCH 1/2] count: Employ new scheme for snippet from count_stat.c

Also add necessary READ_ONCE()/WRITE_ONCE() as par.
the rule of where to use them provided by Paul as an
outline on perfbook mail-list [1].

[1]: https://www.spinics.net/lists/perfbook/msg01788.html

Suggested by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 CodeSamples/count/count_stat.c | 22 +++++++++++++---------
 count/count.tex                | 33 ++++++++-------------------------
 2 files changed, 21 insertions(+), 34 deletions(-)

diff --git a/CodeSamples/count/count_stat.c b/CodeSamples/count/count_stat.c
index 1d72f99..0675320 100644
--- a/CodeSamples/count/count_stat.c
+++ b/CodeSamples/count/count_stat.c
@@ -20,28 +20,32 @@
 
 #include "../api.h"
 
-DEFINE_PER_THREAD(unsigned long, counter);
+//\begin{snippet}[labelbase=ln:count:count_stat:inc-read,commandchars=\\\[\]]
+DEFINE_PER_THREAD(unsigned long, counter);		//\lnlbl{define}
 
-void inc_count(void)
+static __inline__ void inc_count(void)			//\lnlbl{inc:b}
 {
-	__get_thread_var(counter)++;
-}
+	unsigned long *p_counter = &__get_thread_var(counter);
+
+	WRITE_ONCE(*p_counter, *p_counter + 1);
+}							//\lnlbl{inc:e}
 
-__inline__ unsigned long read_count(void)
+static __inline__ unsigned long read_count(void)	//\lnlbl{read:b}
 {
 	int t;
 	unsigned long sum = 0;
 
 	for_each_thread(t)
-		sum += per_thread(counter, t);
+		sum += READ_ONCE(per_thread(counter, t));
 	return sum;
-}
+}							//\lnlbl{read:e}
+//\end{snippet}
 
-__inline__ void count_init(void)
+void count_init(void)
 {
 }
 
-__inline__ void count_cleanup(void)
+void count_cleanup(void)
 {
 }
 
diff --git a/count/count.tex b/count/count.tex
index a7d4c9d..4f760a4 100644
--- a/count/count.tex
+++ b/count/count.tex
@@ -477,28 +477,7 @@ thread (presumably cache aligned and padded to avoid false sharing).
 } \QuickQuizEnd
 
 \begin{listing}[tbp]
-{ \scriptsize
-\begin{verbbox}
-  1 DEFINE_PER_THREAD(long, counter);
-  2 
-  3 void inc_count(void)
-  4 {
-  5   __get_thread_var(counter)++;
-  6 }
-  7 
-  8 long read_count(void)
-  9 {
- 10   int t;
- 11   long sum = 0;
- 12 
- 13   for_each_thread(t)
- 14     sum += per_thread(counter, t);
- 15   return sum;
- 16 }
-\end{verbbox}
-}
-\centering
-\theverbbox
+\input{CodeSamples/count/count_stat@inc-read.fcv}
 \caption{Array-Based Per-Thread Statistical Counters}
 \label{lst:count:Array-Based Per-Thread Statistical Counters}
 \end{listing}
@@ -506,22 +485,26 @@ thread (presumably cache aligned and padded to avoid false sharing).
 Such an array can be wrapped into per-thread primitives, as shown in
 Listing~\ref{lst:count:Array-Based Per-Thread Statistical Counters}
 (\path{count_stat.c}).
-Line~1 defines an array containing a set of per-thread counters of
+\begin{lineref}[ln:count:count_stat:inc-read]
+Line~\lnref{define} defines an array containing a set of per-thread counters of
 type \co{long} named, creatively enough, \co{counter}.
 
-Lines~3-6 show a function that increments the counters, using the
+Lines~\lnref{inc:b}-\lnref{inc:e}
+show a function that increments the counters, using the
 \co{__get_thread_var()} primitive to locate the currently running
 thread's element of the \co{counter} array.
 Because this element is modified only by the corresponding thread,
 non-atomic increment suffices.
 
-Lines~8-16 show a function that reads out the aggregate value of the counter,
+Lines~\lnref{read:b}-\lnref{read:e}
+show a function that reads out the aggregate value of the counter,
 using the \co{for_each_thread()} primitive to iterate over the list of
 currently running threads, and using the \co{per_thread()} primitive
 to fetch the specified thread's counter.
 Because the hardware can fetch and store a properly aligned \co{long}
 atomically, and because \GCC\ is kind enough to make use of this capability,
 normal loads suffice, and no special atomic instructions are required.
+\end{lineref}
 
 \QuickQuiz{}
 	What other choice does \GCC\ have, anyway???
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] count: Adjust type of variable 'counter' with code snippet
  2018-10-02 15:15 [PATCH 1/2] count: Employ new scheme for snippet from count_stat.c Akira Yokosawa
@ 2018-10-02 15:16 ` Akira Yokosawa
  2018-10-02 15:24   ` Akira Yokosawa
  0 siblings, 1 reply; 6+ messages in thread
From: Akira Yokosawa @ 2018-10-02 15:16 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

From a9c276c3da87ed327d003a70d60777f41999b4fc Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Wed, 3 Oct 2018 00:08:49 +0900
Subject: [PATCH 2/2] count: Adjust type of variable 'counter' with code snippet

In the actual code of count_stat.c, the per-thread variable
"counter" has the type "unsigned long".

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 count/count.tex | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/count/count.tex b/count/count.tex
index 4f760a4..9f9fd0f 100644
--- a/count/count.tex
+++ b/count/count.tex
@@ -487,7 +487,7 @@ Listing~\ref{lst:count:Array-Based Per-Thread Statistical Counters}
 (\path{count_stat.c}).
 \begin{lineref}[ln:count:count_stat:inc-read]
 Line~\lnref{define} defines an array containing a set of per-thread counters of
-type \co{long} named, creatively enough, \co{counter}.
+type \co{unsigned long} named, creatively enough, \co{counter}.
 
 Lines~\lnref{inc:b}-\lnref{inc:e}
 show a function that increments the counters, using the
@@ -501,7 +501,7 @@ show a function that reads out the aggregate value of the counter,
 using the \co{for_each_thread()} primitive to iterate over the list of
 currently running threads, and using the \co{per_thread()} primitive
 to fetch the specified thread's counter.
-Because the hardware can fetch and store a properly aligned \co{long}
+Because the hardware can fetch and store a properly aligned \co{unsigned long}
 atomically, and because \GCC\ is kind enough to make use of this capability,
 normal loads suffice, and no special atomic instructions are required.
 \end{lineref}
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] count: Adjust type of variable 'counter' with code snippet
  2018-10-02 15:16 ` [PATCH 2/2] count: Adjust type of variable 'counter' with code snippet Akira Yokosawa
@ 2018-10-02 15:24   ` Akira Yokosawa
  2018-10-02 18:21     ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Akira Yokosawa @ 2018-10-02 15:24 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

On 2018/10/03 00:16:12 +0900, Akira Yokosawa wrote:
> From a9c276c3da87ed327d003a70d60777f41999b4fc Mon Sep 17 00:00:00 2001
> From: Akira Yokosawa <akiyks@gmail.com>
> Date: Wed, 3 Oct 2018 00:08:49 +0900
> Subject: [PATCH 2/2] count: Adjust type of variable 'counter' with code snippet
> 
> In the actual code of count_stat.c, the per-thread variable
> "counter" has the type "unsigned long".
> 

And the other description on Listing 5.3 in the text needs some reworking
to reflect the changes in the snippet. Paul, can you have a look into
this?

        Thanks, Akira

> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
> ---
>  count/count.tex | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/count/count.tex b/count/count.tex
> index 4f760a4..9f9fd0f 100644
> --- a/count/count.tex
> +++ b/count/count.tex
> @@ -487,7 +487,7 @@ Listing~\ref{lst:count:Array-Based Per-Thread Statistical Counters}
>  (\path{count_stat.c}).
>  \begin{lineref}[ln:count:count_stat:inc-read]
>  Line~\lnref{define} defines an array containing a set of per-thread counters of
> -type \co{long} named, creatively enough, \co{counter}.
> +type \co{unsigned long} named, creatively enough, \co{counter}.
>  
>  Lines~\lnref{inc:b}-\lnref{inc:e}
>  show a function that increments the counters, using the
> @@ -501,7 +501,7 @@ show a function that reads out the aggregate value of the counter,
>  using the \co{for_each_thread()} primitive to iterate over the list of
>  currently running threads, and using the \co{per_thread()} primitive
>  to fetch the specified thread's counter.
> -Because the hardware can fetch and store a properly aligned \co{long}
> +Because the hardware can fetch and store a properly aligned \co{unsigned long}
>  atomically, and because \GCC\ is kind enough to make use of this capability,
>  normal loads suffice, and no special atomic instructions are required.
>  \end{lineref}
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] count: Adjust type of variable 'counter' with code snippet
  2018-10-02 15:24   ` Akira Yokosawa
@ 2018-10-02 18:21     ` Paul E. McKenney
  2018-10-02 22:12       ` Akira Yokosawa
  0 siblings, 1 reply; 6+ messages in thread
From: Paul E. McKenney @ 2018-10-02 18:21 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Wed, Oct 03, 2018 at 12:24:17AM +0900, Akira Yokosawa wrote:
> On 2018/10/03 00:16:12 +0900, Akira Yokosawa wrote:
> > From a9c276c3da87ed327d003a70d60777f41999b4fc Mon Sep 17 00:00:00 2001
> > From: Akira Yokosawa <akiyks@gmail.com>
> > Date: Wed, 3 Oct 2018 00:08:49 +0900
> > Subject: [PATCH 2/2] count: Adjust type of variable 'counter' with code snippet
> > 
> > In the actual code of count_stat.c, the per-thread variable
> > "counter" has the type "unsigned long".

Looks good, so I applied and pushed both patches, thank you!

> And the other description on Listing 5.3 in the text needs some reworking
> to reflect the changes in the snippet. Paul, can you have a look into
> this?

How does the patch below look?

							Thanx, Paul

------------------------------------------------------------------------

commit cc0409cfb0581924b05744a15fa3ebd5c1e298af
Author: Paul E. McKenney <paulmck@linux.ibm.com>
Date:   Tue Oct 2 11:19:33 2018 -0700

    count: Update code description and QQ based on {READ,WRITE}_ONCE()
    
    Reported-by: Akira Yokosawa <akiyks@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>

diff --git a/count/count.tex b/count/count.tex
index 9f9fd0f64838..c36e7d826a7b 100644
--- a/count/count.tex
+++ b/count/count.tex
@@ -495,23 +495,22 @@ show a function that increments the counters, using the
 thread's element of the \co{counter} array.
 Because this element is modified only by the corresponding thread,
 non-atomic increment suffices.
-
-Lines~\lnref{read:b}-\lnref{read:e}
-show a function that reads out the aggregate value of the counter,
-using the \co{for_each_thread()} primitive to iterate over the list of
-currently running threads, and using the \co{per_thread()} primitive
-to fetch the specified thread's counter.
-Because the hardware can fetch and store a properly aligned \co{unsigned long}
-atomically, and because \GCC\ is kind enough to make use of this capability,
-normal loads suffice, and no special atomic instructions are required.
-\end{lineref}
+However, this code uses \co{WRITE_ONCE()} to prevent destructive compiler
+optimizations.
+For but one example, the compiler is within its rights to use a
+to-be-stored-to location as temporary storage, thus writing what
+would be for all intents and purposes garbage to that location
+just before doing the desired store.
+This could of course be rather confusing to anything attempting to
+read out the count.
+The use of \co{WRITE_ONCE()} prevents this optimization and others besides.
 
 \QuickQuiz{}
-	What other choice does \GCC\ have, anyway???
+	What other nasty optimizations could \GCC\ apply?
 \QuickQuizAnswer{
-	According to the C standard, the effects of fetching a variable
-	that might be concurrently modified by some other thread are
-	undefined.
+	According to the C standard, the effects of doing a normal store
+	to a variable that might be concurrently loaded by some other
+	thread are undefined.
 	It turns out that the C standard really has no other choice,
 	given that C must support (for example) eight-bit architectures
 	which are incapable of atomically loading a \co{long}.
@@ -540,8 +539,10 @@ normal loads suffice, and no special atomic instructions are required.
 	It is therefore reasonable to expect that any Linux-kernel
 	adoption of C11 atomics will be incremental at best.
 
-	So why not just use plain old C-language assignment statements
-	to access shared variables?
+	But if the code is doing loads and stores that the underlying
+	hardware can implement with single instructions, why not just
+	use plain old C-language assignment statements to access shared
+	variables?
 	The problem is that the compiler is within its rights to assume
 	that no other thread is modifying the variable being accessed.
 	Given a plain old C-language load, the compiler would therefore
@@ -581,6 +582,22 @@ normal loads suffice, and no special atomic instructions are required.
 	coding standards.
 } \QuickQuizEnd
 
+Lines~\lnref{read:b}-\lnref{read:e}
+show a function that reads out the aggregate value of the counter,
+using the \co{for_each_thread()} primitive to iterate over the list of
+currently running threads, and using the \co{per_thread()} primitive
+to fetch the specified thread's counter.
+This code also uses \co{READ_ONCE()} to ensure that the compiler doesn't
+optimize these loads into oblivion.
+For but one example, a pair of consecutive calls to \co{read_count()}
+might be inlined, and an intrepid optimizer might notice that the same
+locations were being summed and thus incorrectly conclude that it would
+be simply wonderful to sum them once and use the resulting value twice.
+This sort of optimization might be rather frustrating to people expecting
+later \co{read_count()} calls to return larger values.
+The use of \co{READ_ONCE()} prevents this optimization and others besides.
+\end{lineref}
+
 \QuickQuiz{}
 	How does the per-thread \co{counter} variable in
 	Listing~\ref{lst:count:Array-Based Per-Thread Statistical Counters}


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] count: Adjust type of variable 'counter' with code snippet
  2018-10-02 18:21     ` Paul E. McKenney
@ 2018-10-02 22:12       ` Akira Yokosawa
  2018-10-04  3:40         ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Akira Yokosawa @ 2018-10-02 22:12 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

On 2018/10/02 11:21:23 -0700, Paul E. McKenney wrote:
> On Wed, Oct 03, 2018 at 12:24:17AM +0900, Akira Yokosawa wrote:
>> On 2018/10/03 00:16:12 +0900, Akira Yokosawa wrote:
>>> From a9c276c3da87ed327d003a70d60777f41999b4fc Mon Sep 17 00:00:00 2001
>>> From: Akira Yokosawa <akiyks@gmail.com>
>>> Date: Wed, 3 Oct 2018 00:08:49 +0900
>>> Subject: [PATCH 2/2] count: Adjust type of variable 'counter' with code snippet
>>>
>>> In the actual code of count_stat.c, the per-thread variable
>>> "counter" has the type "unsigned long".
> 
> Looks good, so I applied and pushed both patches, thank you!
> 
>> And the other description on Listing 5.3 in the text needs some reworking
>> to reflect the changes in the snippet. Paul, can you have a look into
>> this?
> 
> How does the patch below look?

Looks good to me (at least for the moment).

Acked-by: Akira Yokosawa <akiyks@gmail.com>

And I'm anticipating your expansion on the use of READ_ONCE()/WRITE_ONCE()
in, say, somewhere in toolsoftrade. Then we would be able to reduce
repetitive explanation here and there.

I'll continue to apply new scheme for other snippets in "count" chapter.

        Thanks, Akira

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit cc0409cfb0581924b05744a15fa3ebd5c1e298af
> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> Date:   Tue Oct 2 11:19:33 2018 -0700
> 
>     count: Update code description and QQ based on {READ,WRITE}_ONCE()
>     
>     Reported-by: Akira Yokosawa <akiyks@gmail.com>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> 
> diff --git a/count/count.tex b/count/count.tex
> index 9f9fd0f64838..c36e7d826a7b 100644
> --- a/count/count.tex
> +++ b/count/count.tex
> @@ -495,23 +495,22 @@ show a function that increments the counters, using the
>  thread's element of the \co{counter} array.
>  Because this element is modified only by the corresponding thread,
>  non-atomic increment suffices.
> -
> -Lines~\lnref{read:b}-\lnref{read:e}
> -show a function that reads out the aggregate value of the counter,
> -using the \co{for_each_thread()} primitive to iterate over the list of
> -currently running threads, and using the \co{per_thread()} primitive
> -to fetch the specified thread's counter.
> -Because the hardware can fetch and store a properly aligned \co{unsigned long}
> -atomically, and because \GCC\ is kind enough to make use of this capability,
> -normal loads suffice, and no special atomic instructions are required.
> -\end{lineref}
> +However, this code uses \co{WRITE_ONCE()} to prevent destructive compiler
> +optimizations.
> +For but one example, the compiler is within its rights to use a
> +to-be-stored-to location as temporary storage, thus writing what
> +would be for all intents and purposes garbage to that location
> +just before doing the desired store.
> +This could of course be rather confusing to anything attempting to
> +read out the count.
> +The use of \co{WRITE_ONCE()} prevents this optimization and others besides.
>  
>  \QuickQuiz{}
> -	What other choice does \GCC\ have, anyway???
> +	What other nasty optimizations could \GCC\ apply?
>  \QuickQuizAnswer{
> -	According to the C standard, the effects of fetching a variable
> -	that might be concurrently modified by some other thread are
> -	undefined.
> +	According to the C standard, the effects of doing a normal store
> +	to a variable that might be concurrently loaded by some other
> +	thread are undefined.
>  	It turns out that the C standard really has no other choice,
>  	given that C must support (for example) eight-bit architectures
>  	which are incapable of atomically loading a \co{long}.
> @@ -540,8 +539,10 @@ normal loads suffice, and no special atomic instructions are required.
>  	It is therefore reasonable to expect that any Linux-kernel
>  	adoption of C11 atomics will be incremental at best.
>  
> -	So why not just use plain old C-language assignment statements
> -	to access shared variables?
> +	But if the code is doing loads and stores that the underlying
> +	hardware can implement with single instructions, why not just
> +	use plain old C-language assignment statements to access shared
> +	variables?
>  	The problem is that the compiler is within its rights to assume
>  	that no other thread is modifying the variable being accessed.
>  	Given a plain old C-language load, the compiler would therefore
> @@ -581,6 +582,22 @@ normal loads suffice, and no special atomic instructions are required.
>  	coding standards.
>  } \QuickQuizEnd
>  
> +Lines~\lnref{read:b}-\lnref{read:e}
> +show a function that reads out the aggregate value of the counter,
> +using the \co{for_each_thread()} primitive to iterate over the list of
> +currently running threads, and using the \co{per_thread()} primitive
> +to fetch the specified thread's counter.
> +This code also uses \co{READ_ONCE()} to ensure that the compiler doesn't
> +optimize these loads into oblivion.
> +For but one example, a pair of consecutive calls to \co{read_count()}
> +might be inlined, and an intrepid optimizer might notice that the same
> +locations were being summed and thus incorrectly conclude that it would
> +be simply wonderful to sum them once and use the resulting value twice.
> +This sort of optimization might be rather frustrating to people expecting
> +later \co{read_count()} calls to return larger values.
> +The use of \co{READ_ONCE()} prevents this optimization and others besides.
> +\end{lineref}
> +
>  \QuickQuiz{}
>  	How does the per-thread \co{counter} variable in
>  	Listing~\ref{lst:count:Array-Based Per-Thread Statistical Counters}
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] count: Adjust type of variable 'counter' with code snippet
  2018-10-02 22:12       ` Akira Yokosawa
@ 2018-10-04  3:40         ` Paul E. McKenney
  0 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2018-10-04  3:40 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Wed, Oct 03, 2018 at 07:12:32AM +0900, Akira Yokosawa wrote:
> On 2018/10/02 11:21:23 -0700, Paul E. McKenney wrote:
> > On Wed, Oct 03, 2018 at 12:24:17AM +0900, Akira Yokosawa wrote:
> >> On 2018/10/03 00:16:12 +0900, Akira Yokosawa wrote:
> >>> From a9c276c3da87ed327d003a70d60777f41999b4fc Mon Sep 17 00:00:00 2001
> >>> From: Akira Yokosawa <akiyks@gmail.com>
> >>> Date: Wed, 3 Oct 2018 00:08:49 +0900
> >>> Subject: [PATCH 2/2] count: Adjust type of variable 'counter' with code snippet
> >>>
> >>> In the actual code of count_stat.c, the per-thread variable
> >>> "counter" has the type "unsigned long".
> > 
> > Looks good, so I applied and pushed both patches, thank you!
> > 
> >> And the other description on Listing 5.3 in the text needs some reworking
> >> to reflect the changes in the snippet. Paul, can you have a look into
> >> this?
> > 
> > How does the patch below look?
> 
> Looks good to me (at least for the moment).
> 
> Acked-by: Akira Yokosawa <akiyks@gmail.com>
> 
> And I'm anticipating your expansion on the use of READ_ONCE()/WRITE_ONCE()
> in, say, somewhere in toolsoftrade. Then we would be able to reduce
> repetitive explanation here and there.

I am working on this, but there is a surprising amount to say.  So it
might take some time.  On the other hand, with the rework of the
memory-ordering chapter, this part of the Tools of the Trade chapter
seems to be the weakest part of the book, so it certainly makes sense
for me to focus there a bit.

> I'll continue to apply new scheme for other snippets in "count" chapter.

Very good, looking forward to seeing them!

							Thanx, Paul

>         Thanks, Akira
> 
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit cc0409cfb0581924b05744a15fa3ebd5c1e298af
> > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > Date:   Tue Oct 2 11:19:33 2018 -0700
> > 
> >     count: Update code description and QQ based on {READ,WRITE}_ONCE()
> >     
> >     Reported-by: Akira Yokosawa <akiyks@gmail.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > 
> > diff --git a/count/count.tex b/count/count.tex
> > index 9f9fd0f64838..c36e7d826a7b 100644
> > --- a/count/count.tex
> > +++ b/count/count.tex
> > @@ -495,23 +495,22 @@ show a function that increments the counters, using the
> >  thread's element of the \co{counter} array.
> >  Because this element is modified only by the corresponding thread,
> >  non-atomic increment suffices.
> > -
> > -Lines~\lnref{read:b}-\lnref{read:e}
> > -show a function that reads out the aggregate value of the counter,
> > -using the \co{for_each_thread()} primitive to iterate over the list of
> > -currently running threads, and using the \co{per_thread()} primitive
> > -to fetch the specified thread's counter.
> > -Because the hardware can fetch and store a properly aligned \co{unsigned long}
> > -atomically, and because \GCC\ is kind enough to make use of this capability,
> > -normal loads suffice, and no special atomic instructions are required.
> > -\end{lineref}
> > +However, this code uses \co{WRITE_ONCE()} to prevent destructive compiler
> > +optimizations.
> > +For but one example, the compiler is within its rights to use a
> > +to-be-stored-to location as temporary storage, thus writing what
> > +would be for all intents and purposes garbage to that location
> > +just before doing the desired store.
> > +This could of course be rather confusing to anything attempting to
> > +read out the count.
> > +The use of \co{WRITE_ONCE()} prevents this optimization and others besides.
> >  
> >  \QuickQuiz{}
> > -	What other choice does \GCC\ have, anyway???
> > +	What other nasty optimizations could \GCC\ apply?
> >  \QuickQuizAnswer{
> > -	According to the C standard, the effects of fetching a variable
> > -	that might be concurrently modified by some other thread are
> > -	undefined.
> > +	According to the C standard, the effects of doing a normal store
> > +	to a variable that might be concurrently loaded by some other
> > +	thread are undefined.
> >  	It turns out that the C standard really has no other choice,
> >  	given that C must support (for example) eight-bit architectures
> >  	which are incapable of atomically loading a \co{long}.
> > @@ -540,8 +539,10 @@ normal loads suffice, and no special atomic instructions are required.
> >  	It is therefore reasonable to expect that any Linux-kernel
> >  	adoption of C11 atomics will be incremental at best.
> >  
> > -	So why not just use plain old C-language assignment statements
> > -	to access shared variables?
> > +	But if the code is doing loads and stores that the underlying
> > +	hardware can implement with single instructions, why not just
> > +	use plain old C-language assignment statements to access shared
> > +	variables?
> >  	The problem is that the compiler is within its rights to assume
> >  	that no other thread is modifying the variable being accessed.
> >  	Given a plain old C-language load, the compiler would therefore
> > @@ -581,6 +582,22 @@ normal loads suffice, and no special atomic instructions are required.
> >  	coding standards.
> >  } \QuickQuizEnd
> >  
> > +Lines~\lnref{read:b}-\lnref{read:e}
> > +show a function that reads out the aggregate value of the counter,
> > +using the \co{for_each_thread()} primitive to iterate over the list of
> > +currently running threads, and using the \co{per_thread()} primitive
> > +to fetch the specified thread's counter.
> > +This code also uses \co{READ_ONCE()} to ensure that the compiler doesn't
> > +optimize these loads into oblivion.
> > +For but one example, a pair of consecutive calls to \co{read_count()}
> > +might be inlined, and an intrepid optimizer might notice that the same
> > +locations were being summed and thus incorrectly conclude that it would
> > +be simply wonderful to sum them once and use the resulting value twice.
> > +This sort of optimization might be rather frustrating to people expecting
> > +later \co{read_count()} calls to return larger values.
> > +The use of \co{READ_ONCE()} prevents this optimization and others besides.
> > +\end{lineref}
> > +
> >  \QuickQuiz{}
> >  	How does the per-thread \co{counter} variable in
> >  	Listing~\ref{lst:count:Array-Based Per-Thread Statistical Counters}
> > 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-10-04 10:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-02 15:15 [PATCH 1/2] count: Employ new scheme for snippet from count_stat.c Akira Yokosawa
2018-10-02 15:16 ` [PATCH 2/2] count: Adjust type of variable 'counter' with code snippet Akira Yokosawa
2018-10-02 15:24   ` Akira Yokosawa
2018-10-02 18:21     ` Paul E. McKenney
2018-10-02 22:12       ` Akira Yokosawa
2018-10-04  3:40         ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.