All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] polish `Why Memory Barriers?` appendix
@ 2016-03-20  0:24 SeongJae Park
  2016-03-20  0:24 ` [PATCH 1/3] whymb: fix trivial typos SeongJae Park
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: SeongJae Park @ 2016-03-20  0:24 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook, SeongJae Park

This patchset polish `Why Memory Barriers?` appendix by fixing trivial nitpicks
that found while translating it.  It fix trivial and evident typos, make terms
to be used consistently, and finally proposes removal of a sentence that looks
outdated.

SeongJae Park (3):
  whymb: fix trivial typos
  whymb: s/write buffer/store buffer
  whymb: remove ARM's short multi processor history description

 appendix/whymb/whymemorybarriers.tex | 92 ++++++++++++++++++------------------
 1 file changed, 45 insertions(+), 47 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] whymb: fix trivial typos
  2016-03-20  0:24 [PATCH 0/3] polish `Why Memory Barriers?` appendix SeongJae Park
@ 2016-03-20  0:24 ` SeongJae Park
  2016-03-20  0:24 ` [PATCH 2/3] whymb: s/write buffer/store buffer SeongJae Park
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: SeongJae Park @ 2016-03-20  0:24 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook, SeongJae Park

This commit fixes trivial typos in `whymemorybarriers.tex` file.  The
trivial typos are missed tildes, few grammatical typos, wrong position
of sentence ending dot, and an evident typo (s/HIPS/MIPS).

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
 appendix/whymb/whymemorybarriers.tex | 86 ++++++++++++++++++------------------
 1 file changed, 43 insertions(+), 43 deletions(-)

diff --git a/appendix/whymb/whymemorybarriers.tex b/appendix/whymb/whymemorybarriers.tex
index 856961f..8025bec 100644
--- a/appendix/whymb/whymemorybarriers.tex
+++ b/appendix/whymb/whymemorybarriers.tex
@@ -618,10 +618,10 @@ to a given item of data, its performance for the first write to
 a given cache line is quite poor.
 To see this, consider
 Figure~\ref{fig:app:whymb:Writes See Unnecessary Stalls},
-which shows a timeline of a write by CPU 0 to a cacheline held in
-CPU 1's cache.
-Since CPU 0 must wait for the cache line to arrive before it can
-write to it, CPU 0 must stall for an extended period of time.\footnote{
+which shows a timeline of a write by CPU~0 to a cacheline held in
+CPU~1's cache.
+Since CPU~0 must wait for the cache line to arrive before it can
+write to it, CPU~0 must stall for an extended period of time.\footnote{
 	The time required to transfer a cache line from one CPU's cache
 	to another's is typically a few orders of magnitude more than
 	that required to execute a simple register-to-register instruction.}
@@ -635,9 +635,9 @@ write to it, CPU 0 must stall for an extended period of time.\footnote{
 \label{fig:app:whymb:Writes See Unnecessary Stalls}
 \end{figure}

-But there is no real reason to force CPU 0 to stall for so long --- after
-all, regardless of what data happens to be in the cache line that CPU 1
-sends it, CPU 0 is going to unconditionally overwrite it.
+But there is no real reason to force CPU~0 to stall for so long --- after
+all, regardless of what data happens to be in the cache line that CPU~1
+sends it, CPU~0 is going to unconditionally overwrite it.

 \subsection{Store Buffers}
 \label{sec:app:whymb:Store Buffers}
@@ -645,9 +645,9 @@ sends it, CPU 0 is going to unconditionally overwrite it.
 One way to prevent this unnecessary stalling of writes is to add
 ``store buffers'' between each CPU and its cache, as shown in
 Figure~\ref{fig:app:whymb:Caches With Store Buffers}.
-With the addition of these store buffers, CPU 0 can simply record
+With the addition of these store buffers, CPU~0 can simply record
 its write in its store buffer and continue executing.
-When the cache line does finally make its way from CPU 1 to CPU 0,
+When the cache line does finally make its way from CPU~1 to CPU~0,
 the data will be moved from the store buffer to the cache line.

 \QuickQuiz{}
@@ -711,26 +711,26 @@ Figure~\ref{fig:app:whymb:Caches With Store Buffers},
 one would be surprised.
 Such a system could potentially see the following sequence of events:
 \begin{enumerate}
-\item	CPU 0 starts executing the \co{a = 1}.
-\item	CPU 0 looks ``a'' up in the cache, and finds that it is missing.
-\item	CPU 0 therefore sends a ``read invalidate'' message in order to
+\item	CPU~0 starts executing the \co{a = 1}.
+\item	CPU~0 looks ``a'' up in the cache, and finds that it is missing.
+\item	CPU~0 therefore sends a ``read invalidate'' message in order to
 	get exclusive ownership of the cache line containing ``a''.
-\item	CPU 0 records the store to ``a'' in its store buffer.
-\item	CPU 1 receives the ``read invalidate'' message, and responds
+\item	CPU~0 records the store to ``a'' in its store buffer.
+\item	CPU~1 receives the ``read invalidate'' message, and responds
 	by transmitting the cache line and removing that cacheline from
 	its cache.
-\item	CPU 0 starts executing the \co{b = a + 1}.
-\item	CPU 0 receives the cache line from CPU 1, which still has
+\item	CPU~0 starts executing the \co{b = a + 1}.
+\item	CPU~0 receives the cache line from CPU~1, which still has
 	a value of zero for ``a''.
-\item	CPU 0 loads ``a'' from its cache, finding the value zero.
+\item	CPU~0 loads ``a'' from its cache, finding the value zero.
 	\label{item:app:whymb:Need Store Buffer}
-\item	CPU 0 applies the entry from its store buffer to the newly
+\item	CPU~0 applies the entry from its store buffer to the newly
 	arrived cache line, setting the value of ``a'' in its cache
 	to one.
-\item	CPU 0 adds one to the value zero loaded for ``a'' above,
+\item	CPU~0 adds one to the value zero loaded for ``a'' above,
 	and stores it into the cache line containing ``b''
-	(which we will assume is already owned by CPU 0).
-\item	CPU 0 executes \co{assert(b == 2)}, which fails.
+	(which we will assume is already owned by CPU~0).
+\item	CPU~0 executes \co{assert(b == 2)}, which fails.
 \end{enumerate}

 The problem is that we have two copies of ``a'', one in the cache and
@@ -788,7 +788,7 @@ with variables ``a'' and ``b'' initially zero:

 Suppose CPU~0 executes foo() and CPU~1 executes bar().
 Suppose further that the cache line containing ``a'' resides only in CPU~1's
-cache, and that the cache line containing ``b'' is owned by CPU 0.
+cache, and that the cache line containing ``b'' is owned by CPU~0.
 Then the sequence of operations might be as follows:
 \begin{enumerate}
 \item	CPU~0 executes \co{a = 1}.  The cache line is not in
@@ -1366,9 +1366,9 @@ Each of ``a'', ``b'', and ``c'' are initially zero.
 \small
 \begin{center}
 \begin{tabular}{l|l|l}
-	\multicolumn{1}{c|}{CPU 0} &
-		\multicolumn{1}{c|}{CPU 1} &
-			\multicolumn{1}{c}{CPU 2} \\
+	\multicolumn{1}{c|}{CPU~0} &
+		\multicolumn{1}{c|}{CPU~1} &
+			\multicolumn{1}{c}{CPU~2} \\
 	\hline
 	\hline
 	\co{a = 1;}	 &		& \\
@@ -1427,9 +1427,9 @@ Both ``a'' and ``b'' are initially zero.
 \small
 \begin{center}
 \begin{tabular}{l|l|l}
-	\multicolumn{1}{c|}{CPU 0} &
-		\multicolumn{1}{c|}{CPU 1} &
-			\multicolumn{1}{c}{CPU 2} \\
+	\multicolumn{1}{c|}{CPU~0} &
+		\multicolumn{1}{c|}{CPU~1} &
+			\multicolumn{1}{c}{CPU~2} \\
 	\hline
 	\hline
 	\co{a = 1;} & \co{while (a == 0)}; & \\
@@ -1470,9 +1470,9 @@ All variables are initially zero.
 \scriptsize
 \begin{center}
 \begin{tabular}{r|l|l|l}
-	& \multicolumn{1}{c|}{CPU 0} &
-		\multicolumn{1}{c|}{CPU 1} &
-			\multicolumn{1}{c}{CPU 2} \\
+	& \multicolumn{1}{c|}{CPU~0} &
+		\multicolumn{1}{c|}{CPU~1} &
+			\multicolumn{1}{c}{CPU~2} \\
 	\hline
 	\hline
  1 &	\co{a = 1;} &			& \\
@@ -1521,7 +1521,7 @@ Therefore, CPU~2's assertion on line~9 is guaranteed \emph{not} to fire.
 	Table~\ref{tab:app:whymb:Memory Barrier Example 3},
 	would this assert ever trigger?
 \QuickQuizAnswer{
-	The result depends on whether the CPU supports ``transitivity.''
+	The result depends on whether the CPU supports ``transitivity''.
 	In other words, CPU~0 stored to ``e'' after seeing CPU~1's
 	store to ``c'', with a memory barrier between CPU~0's load
 	from ``c'' and store to ``e''.
@@ -1728,7 +1728,7 @@ Figure~\ref{fig:app:whymb:Insert and Lock-Free Search}.
 This {\tt smp\_wmb()} on line~9 of this figure
 guarantees that the element initialization
 in lines 6-8 is executed before the element is added to the
-list on line 10, so that the lock-free search will work correctly.
+list on line~10, so that the lock-free search will work correctly.
 That is, it makes this guarantee on all CPUs {\em except} Alpha.

 \begin{figure}
@@ -1767,25 +1767,25 @@ That is, it makes this guarantee on all CPUs {\em except} Alpha.
 \end{figure}

 Alpha has extremely weak memory ordering
-such that the code on line 20 of
+such that the code on line~20 of
 Figure~\ref{fig:app:whymb:Insert and Lock-Free Search} could see the old
-garbage values that were present before the initialization on lines 6-8.
+garbage values that were present before the initialization on lines~6-8.

 Figure~\ref{fig:app:whymb:Why smp-read-barrier-depends() is Required}
 shows how this can happen on
 an aggressively parallel machine with partitioned caches, so that
-alternating caches lines are processed by the different partitions
+alternating cache lines are processed by the different partitions
 of the caches.
 Assume that the list header {\tt head} will be processed by cache bank~0,
 and that the new element will be processed by cache bank~1.
 On Alpha, the {\tt smp\_wmb()} will guarantee that the cache invalidates performed
-by lines 6-8 of
+by lines~6-8 of
 Figure~\ref{fig:app:whymb:Insert and Lock-Free Search} will reach
-the interconnect before that of line 10 does, but
+the interconnect before that of line~10 does, but
 makes absolutely no guarantee about the order in which the new values will
 reach the reading CPU's core.
-For example, it is possible that the reading CPU's cache bank 1 is very
-busy, but cache bank 0 is idle.
+For example, it is possible that the reading CPU's cache bank~1 is very
+busy, but cache bank~0 is idle.
 This could result in the cache invalidates for the new element being
 delayed, so that the reading CPU gets the new value for the pointer,
 but sees the old cached values for the new element.
@@ -1976,8 +1976,8 @@ different set of memory-barrier instructions~\cite{ARMv7A:2010}:
 	pipeline, so that all instructions following the \co{ISB}
 	are fetched only after the \co{ISB} completes.
 	For example, if you are writing a self-modifying program
-	(such as a JIT), you should execute an \co{ISB} after
-	between generating the code and executing it.
+	(such as a JIT), you should execute an \co{ISB} between
+	generating the code and executing it.
 \end{enumerate}

 None of these instructions exactly match the semantics of Linux's
@@ -2108,7 +2108,7 @@ definition of transitivity or cumulativity similar to that of
 ARM and Power.
 However, it appears that different MIPS implementations can have
 different memory-ordering properties, so it is important to consult
-the documentation for the specific HIPS implementation you are using.
+the documentation for the specific MIPS implementation you are using.

 \subsection{PA-RISC}

-- 
1.9.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] whymb: s/write buffer/store buffer
  2016-03-20  0:24 [PATCH 0/3] polish `Why Memory Barriers?` appendix SeongJae Park
  2016-03-20  0:24 ` [PATCH 1/3] whymb: fix trivial typos SeongJae Park
@ 2016-03-20  0:24 ` SeongJae Park
  2016-03-20  0:24 ` [PATCH 3/3] whymb: remove ARM's short multi processor history description SeongJae Park
  2016-03-20 15:53 ` [PATCH 0/3] polish `Why Memory Barriers?` appendix Paul E. McKenney
  3 siblings, 0 replies; 5+ messages in thread
From: SeongJae Park @ 2016-03-20  0:24 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook, SeongJae Park

`Why memory barriers?` appendix uses the term `Store buffer`
consistently.  However, few sentences use another term, `Write buffer`.
Change it to `Store buffer` for consistent term usage.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
 appendix/whymb/whymemorybarriers.tex | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/appendix/whymb/whymemorybarriers.tex b/appendix/whymb/whymemorybarriers.tex
index 8025bec..38ffad6 100644
--- a/appendix/whymb/whymemorybarriers.tex
+++ b/appendix/whymb/whymemorybarriers.tex
@@ -2232,10 +2232,10 @@ thus fully ordering memory operations.

 So, why is {\tt membar \#MemIssue} needed?
 Because a {\tt membar \#StoreLoad} could permit a subsequent
-load to get its value from a write buffer, which would be
+load to get its value from a store buffer, which would be
 disastrous if the write was to an MMIO register that induced side effects
 on the value to be read.
-In contrast, {\tt membar \#MemIssue} would wait until the write buffers
+In contrast, {\tt membar \#MemIssue} would wait until the store buffers
 were flushed before permitting the loads to execute,
 thereby ensuring that the load actually gets its value from the MMIO register.
 Drivers could instead use {\tt membar \#Sync}, but the lighter-weight
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] whymb: remove ARM's short multi processor history description
  2016-03-20  0:24 [PATCH 0/3] polish `Why Memory Barriers?` appendix SeongJae Park
  2016-03-20  0:24 ` [PATCH 1/3] whymb: fix trivial typos SeongJae Park
  2016-03-20  0:24 ` [PATCH 2/3] whymb: s/write buffer/store buffer SeongJae Park
@ 2016-03-20  0:24 ` SeongJae Park
  2016-03-20 15:53 ` [PATCH 0/3] polish `Why Memory Barriers?` appendix Paul E. McKenney
  3 siblings, 0 replies; 5+ messages in thread
From: SeongJae Park @ 2016-03-20  0:24 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook, SeongJae Park

A description about ARM processor says that history of multi processor
ARM CPU is about five years.  However, the sentence was written in
2010 by commit 864762cb5206f31b71757e4da8362d8c1c0e3b7c ("Add ARM to the
"why memory barriers" section.").  It's 2016 now.  Multi processor ARM
CPUs are common and its history is more than a decade.  The sentence can
be simply modified to say ten years rather than five years.  However, it
would be better to simply remove the sentence because  modifying the
word every year would be painful and a decade in computer industry is
not a short period though it may be arguable.  For the reason, this
commit removes the sentence.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
 appendix/whymb/whymemorybarriers.tex | 2 --
 1 file changed, 2 deletions(-)

diff --git a/appendix/whymb/whymemorybarriers.tex b/appendix/whymb/whymemorybarriers.tex
index 38ffad6..2eef059 100644
--- a/appendix/whymb/whymemorybarriers.tex
+++ b/appendix/whymb/whymemorybarriers.tex
@@ -1949,8 +1949,6 @@ SSE and 3DNOW instructions into account.

 The ARM family of CPUs is extremely popular in embedded applications,
 particularly for power-constrained applications such as cellphones.
-There have nevertheless been multiprocessor implementations of ARM
-for more than five years.
 Its memory model is similar to that of Power
 (see Section~\ref{sec:app:whymb:POWER / PowerPC}, but ARM uses a
 different set of memory-barrier instructions~\cite{ARMv7A:2010}:
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/3] polish `Why Memory Barriers?` appendix
  2016-03-20  0:24 [PATCH 0/3] polish `Why Memory Barriers?` appendix SeongJae Park
                   ` (2 preceding siblings ...)
  2016-03-20  0:24 ` [PATCH 3/3] whymb: remove ARM's short multi processor history description SeongJae Park
@ 2016-03-20 15:53 ` Paul E. McKenney
  3 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2016-03-20 15:53 UTC (permalink / raw)
  To: SeongJae Park; +Cc: perfbook

On Sun, Mar 20, 2016 at 09:24:18AM +0900, SeongJae Park wrote:
> This patchset polish `Why Memory Barriers?` appendix by fixing trivial nitpicks
> that found while translating it.  It fix trivial and evident typos, make terms
> to be used consistently, and finally proposes removal of a sentence that looks
> outdated.
> 
> SeongJae Park (3):
>   whymb: fix trivial typos
>   whymb: s/write buffer/store buffer
>   whymb: remove ARM's short multi processor history description

Good eyes, queued and pushed!

							Thanx, Paul

>  appendix/whymb/whymemorybarriers.tex | 92 ++++++++++++++++++------------------
>  1 file changed, 45 insertions(+), 47 deletions(-)
> 
> -- 
> 1.9.1
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-03-20 15:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-20  0:24 [PATCH 0/3] polish `Why Memory Barriers?` appendix SeongJae Park
2016-03-20  0:24 ` [PATCH 1/3] whymb: fix trivial typos SeongJae Park
2016-03-20  0:24 ` [PATCH 2/3] whymb: s/write buffer/store buffer SeongJae Park
2016-03-20  0:24 ` [PATCH 3/3] whymb: remove ARM's short multi processor history description SeongJae Park
2016-03-20 15:53 ` [PATCH 0/3] polish `Why Memory Barriers?` appendix Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.