* [PATCH 0/3] polish `Why Memory Barriers?` appendix
@ 2016-03-20 0:24 SeongJae Park
2016-03-20 0:24 ` [PATCH 1/3] whymb: fix trivial typos SeongJae Park
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: SeongJae Park @ 2016-03-20 0:24 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
This patchset polish `Why Memory Barriers?` appendix by fixing trivial nitpicks
that found while translating it. It fix trivial and evident typos, make terms
to be used consistently, and finally proposes removal of a sentence that looks
outdated.
SeongJae Park (3):
whymb: fix trivial typos
whymb: s/write buffer/store buffer
whymb: remove ARM's short multi processor history description
appendix/whymb/whymemorybarriers.tex | 92 ++++++++++++++++++------------------
1 file changed, 45 insertions(+), 47 deletions(-)
--
1.9.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/3] whymb: fix trivial typos
2016-03-20 0:24 [PATCH 0/3] polish `Why Memory Barriers?` appendix SeongJae Park
@ 2016-03-20 0:24 ` SeongJae Park
2016-03-20 0:24 ` [PATCH 2/3] whymb: s/write buffer/store buffer SeongJae Park
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: SeongJae Park @ 2016-03-20 0:24 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
This commit fixes trivial typos in `whymemorybarriers.tex` file. The
trivial typos are missed tildes, few grammatical typos, wrong position
of sentence ending dot, and an evident typo (s/HIPS/MIPS).
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
appendix/whymb/whymemorybarriers.tex | 86 ++++++++++++++++++------------------
1 file changed, 43 insertions(+), 43 deletions(-)
diff --git a/appendix/whymb/whymemorybarriers.tex b/appendix/whymb/whymemorybarriers.tex
index 856961f..8025bec 100644
--- a/appendix/whymb/whymemorybarriers.tex
+++ b/appendix/whymb/whymemorybarriers.tex
@@ -618,10 +618,10 @@ to a given item of data, its performance for the first write to
a given cache line is quite poor.
To see this, consider
Figure~\ref{fig:app:whymb:Writes See Unnecessary Stalls},
-which shows a timeline of a write by CPU 0 to a cacheline held in
-CPU 1's cache.
-Since CPU 0 must wait for the cache line to arrive before it can
-write to it, CPU 0 must stall for an extended period of time.\footnote{
+which shows a timeline of a write by CPU~0 to a cacheline held in
+CPU~1's cache.
+Since CPU~0 must wait for the cache line to arrive before it can
+write to it, CPU~0 must stall for an extended period of time.\footnote{
The time required to transfer a cache line from one CPU's cache
to another's is typically a few orders of magnitude more than
that required to execute a simple register-to-register instruction.}
@@ -635,9 +635,9 @@ write to it, CPU 0 must stall for an extended period of time.\footnote{
\label{fig:app:whymb:Writes See Unnecessary Stalls}
\end{figure}
-But there is no real reason to force CPU 0 to stall for so long --- after
-all, regardless of what data happens to be in the cache line that CPU 1
-sends it, CPU 0 is going to unconditionally overwrite it.
+But there is no real reason to force CPU~0 to stall for so long --- after
+all, regardless of what data happens to be in the cache line that CPU~1
+sends it, CPU~0 is going to unconditionally overwrite it.
\subsection{Store Buffers}
\label{sec:app:whymb:Store Buffers}
@@ -645,9 +645,9 @@ sends it, CPU 0 is going to unconditionally overwrite it.
One way to prevent this unnecessary stalling of writes is to add
``store buffers'' between each CPU and its cache, as shown in
Figure~\ref{fig:app:whymb:Caches With Store Buffers}.
-With the addition of these store buffers, CPU 0 can simply record
+With the addition of these store buffers, CPU~0 can simply record
its write in its store buffer and continue executing.
-When the cache line does finally make its way from CPU 1 to CPU 0,
+When the cache line does finally make its way from CPU~1 to CPU~0,
the data will be moved from the store buffer to the cache line.
\QuickQuiz{}
@@ -711,26 +711,26 @@ Figure~\ref{fig:app:whymb:Caches With Store Buffers},
one would be surprised.
Such a system could potentially see the following sequence of events:
\begin{enumerate}
-\item CPU 0 starts executing the \co{a = 1}.
-\item CPU 0 looks ``a'' up in the cache, and finds that it is missing.
-\item CPU 0 therefore sends a ``read invalidate'' message in order to
+\item CPU~0 starts executing the \co{a = 1}.
+\item CPU~0 looks ``a'' up in the cache, and finds that it is missing.
+\item CPU~0 therefore sends a ``read invalidate'' message in order to
get exclusive ownership of the cache line containing ``a''.
-\item CPU 0 records the store to ``a'' in its store buffer.
-\item CPU 1 receives the ``read invalidate'' message, and responds
+\item CPU~0 records the store to ``a'' in its store buffer.
+\item CPU~1 receives the ``read invalidate'' message, and responds
by transmitting the cache line and removing that cacheline from
its cache.
-\item CPU 0 starts executing the \co{b = a + 1}.
-\item CPU 0 receives the cache line from CPU 1, which still has
+\item CPU~0 starts executing the \co{b = a + 1}.
+\item CPU~0 receives the cache line from CPU~1, which still has
a value of zero for ``a''.
-\item CPU 0 loads ``a'' from its cache, finding the value zero.
+\item CPU~0 loads ``a'' from its cache, finding the value zero.
\label{item:app:whymb:Need Store Buffer}
-\item CPU 0 applies the entry from its store buffer to the newly
+\item CPU~0 applies the entry from its store buffer to the newly
arrived cache line, setting the value of ``a'' in its cache
to one.
-\item CPU 0 adds one to the value zero loaded for ``a'' above,
+\item CPU~0 adds one to the value zero loaded for ``a'' above,
and stores it into the cache line containing ``b''
- (which we will assume is already owned by CPU 0).
-\item CPU 0 executes \co{assert(b == 2)}, which fails.
+ (which we will assume is already owned by CPU~0).
+\item CPU~0 executes \co{assert(b == 2)}, which fails.
\end{enumerate}
The problem is that we have two copies of ``a'', one in the cache and
@@ -788,7 +788,7 @@ with variables ``a'' and ``b'' initially zero:
Suppose CPU~0 executes foo() and CPU~1 executes bar().
Suppose further that the cache line containing ``a'' resides only in CPU~1's
-cache, and that the cache line containing ``b'' is owned by CPU 0.
+cache, and that the cache line containing ``b'' is owned by CPU~0.
Then the sequence of operations might be as follows:
\begin{enumerate}
\item CPU~0 executes \co{a = 1}. The cache line is not in
@@ -1366,9 +1366,9 @@ Each of ``a'', ``b'', and ``c'' are initially zero.
\small
\begin{center}
\begin{tabular}{l|l|l}
- \multicolumn{1}{c|}{CPU 0} &
- \multicolumn{1}{c|}{CPU 1} &
- \multicolumn{1}{c}{CPU 2} \\
+ \multicolumn{1}{c|}{CPU~0} &
+ \multicolumn{1}{c|}{CPU~1} &
+ \multicolumn{1}{c}{CPU~2} \\
\hline
\hline
\co{a = 1;} & & \\
@@ -1427,9 +1427,9 @@ Both ``a'' and ``b'' are initially zero.
\small
\begin{center}
\begin{tabular}{l|l|l}
- \multicolumn{1}{c|}{CPU 0} &
- \multicolumn{1}{c|}{CPU 1} &
- \multicolumn{1}{c}{CPU 2} \\
+ \multicolumn{1}{c|}{CPU~0} &
+ \multicolumn{1}{c|}{CPU~1} &
+ \multicolumn{1}{c}{CPU~2} \\
\hline
\hline
\co{a = 1;} & \co{while (a == 0)}; & \\
@@ -1470,9 +1470,9 @@ All variables are initially zero.
\scriptsize
\begin{center}
\begin{tabular}{r|l|l|l}
- & \multicolumn{1}{c|}{CPU 0} &
- \multicolumn{1}{c|}{CPU 1} &
- \multicolumn{1}{c}{CPU 2} \\
+ & \multicolumn{1}{c|}{CPU~0} &
+ \multicolumn{1}{c|}{CPU~1} &
+ \multicolumn{1}{c}{CPU~2} \\
\hline
\hline
1 & \co{a = 1;} & & \\
@@ -1521,7 +1521,7 @@ Therefore, CPU~2's assertion on line~9 is guaranteed \emph{not} to fire.
Table~\ref{tab:app:whymb:Memory Barrier Example 3},
would this assert ever trigger?
\QuickQuizAnswer{
- The result depends on whether the CPU supports ``transitivity.''
+ The result depends on whether the CPU supports ``transitivity''.
In other words, CPU~0 stored to ``e'' after seeing CPU~1's
store to ``c'', with a memory barrier between CPU~0's load
from ``c'' and store to ``e''.
@@ -1728,7 +1728,7 @@ Figure~\ref{fig:app:whymb:Insert and Lock-Free Search}.
This {\tt smp\_wmb()} on line~9 of this figure
guarantees that the element initialization
in lines 6-8 is executed before the element is added to the
-list on line 10, so that the lock-free search will work correctly.
+list on line~10, so that the lock-free search will work correctly.
That is, it makes this guarantee on all CPUs {\em except} Alpha.
\begin{figure}
@@ -1767,25 +1767,25 @@ That is, it makes this guarantee on all CPUs {\em except} Alpha.
\end{figure}
Alpha has extremely weak memory ordering
-such that the code on line 20 of
+such that the code on line~20 of
Figure~\ref{fig:app:whymb:Insert and Lock-Free Search} could see the old
-garbage values that were present before the initialization on lines 6-8.
+garbage values that were present before the initialization on lines~6-8.
Figure~\ref{fig:app:whymb:Why smp-read-barrier-depends() is Required}
shows how this can happen on
an aggressively parallel machine with partitioned caches, so that
-alternating caches lines are processed by the different partitions
+alternating cache lines are processed by the different partitions
of the caches.
Assume that the list header {\tt head} will be processed by cache bank~0,
and that the new element will be processed by cache bank~1.
On Alpha, the {\tt smp\_wmb()} will guarantee that the cache invalidates performed
-by lines 6-8 of
+by lines~6-8 of
Figure~\ref{fig:app:whymb:Insert and Lock-Free Search} will reach
-the interconnect before that of line 10 does, but
+the interconnect before that of line~10 does, but
makes absolutely no guarantee about the order in which the new values will
reach the reading CPU's core.
-For example, it is possible that the reading CPU's cache bank 1 is very
-busy, but cache bank 0 is idle.
+For example, it is possible that the reading CPU's cache bank~1 is very
+busy, but cache bank~0 is idle.
This could result in the cache invalidates for the new element being
delayed, so that the reading CPU gets the new value for the pointer,
but sees the old cached values for the new element.
@@ -1976,8 +1976,8 @@ different set of memory-barrier instructions~\cite{ARMv7A:2010}:
pipeline, so that all instructions following the \co{ISB}
are fetched only after the \co{ISB} completes.
For example, if you are writing a self-modifying program
- (such as a JIT), you should execute an \co{ISB} after
- between generating the code and executing it.
+ (such as a JIT), you should execute an \co{ISB} between
+ generating the code and executing it.
\end{enumerate}
None of these instructions exactly match the semantics of Linux's
@@ -2108,7 +2108,7 @@ definition of transitivity or cumulativity similar to that of
ARM and Power.
However, it appears that different MIPS implementations can have
different memory-ordering properties, so it is important to consult
-the documentation for the specific HIPS implementation you are using.
+the documentation for the specific MIPS implementation you are using.
\subsection{PA-RISC}
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/3] whymb: s/write buffer/store buffer
2016-03-20 0:24 [PATCH 0/3] polish `Why Memory Barriers?` appendix SeongJae Park
2016-03-20 0:24 ` [PATCH 1/3] whymb: fix trivial typos SeongJae Park
@ 2016-03-20 0:24 ` SeongJae Park
2016-03-20 0:24 ` [PATCH 3/3] whymb: remove ARM's short multi processor history description SeongJae Park
2016-03-20 15:53 ` [PATCH 0/3] polish `Why Memory Barriers?` appendix Paul E. McKenney
3 siblings, 0 replies; 5+ messages in thread
From: SeongJae Park @ 2016-03-20 0:24 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
`Why memory barriers?` appendix uses the term `Store buffer`
consistently. However, few sentences use another term, `Write buffer`.
Change it to `Store buffer` for consistent term usage.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
appendix/whymb/whymemorybarriers.tex | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/appendix/whymb/whymemorybarriers.tex b/appendix/whymb/whymemorybarriers.tex
index 8025bec..38ffad6 100644
--- a/appendix/whymb/whymemorybarriers.tex
+++ b/appendix/whymb/whymemorybarriers.tex
@@ -2232,10 +2232,10 @@ thus fully ordering memory operations.
So, why is {\tt membar \#MemIssue} needed?
Because a {\tt membar \#StoreLoad} could permit a subsequent
-load to get its value from a write buffer, which would be
+load to get its value from a store buffer, which would be
disastrous if the write was to an MMIO register that induced side effects
on the value to be read.
-In contrast, {\tt membar \#MemIssue} would wait until the write buffers
+In contrast, {\tt membar \#MemIssue} would wait until the store buffers
were flushed before permitting the loads to execute,
thereby ensuring that the load actually gets its value from the MMIO register.
Drivers could instead use {\tt membar \#Sync}, but the lighter-weight
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/3] whymb: remove ARM's short multi processor history description
2016-03-20 0:24 [PATCH 0/3] polish `Why Memory Barriers?` appendix SeongJae Park
2016-03-20 0:24 ` [PATCH 1/3] whymb: fix trivial typos SeongJae Park
2016-03-20 0:24 ` [PATCH 2/3] whymb: s/write buffer/store buffer SeongJae Park
@ 2016-03-20 0:24 ` SeongJae Park
2016-03-20 15:53 ` [PATCH 0/3] polish `Why Memory Barriers?` appendix Paul E. McKenney
3 siblings, 0 replies; 5+ messages in thread
From: SeongJae Park @ 2016-03-20 0:24 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
A description about ARM processor says that history of multi processor
ARM CPU is about five years. However, the sentence was written in
2010 by commit 864762cb5206f31b71757e4da8362d8c1c0e3b7c ("Add ARM to the
"why memory barriers" section."). It's 2016 now. Multi processor ARM
CPUs are common and its history is more than a decade. The sentence can
be simply modified to say ten years rather than five years. However, it
would be better to simply remove the sentence because modifying the
word every year would be painful and a decade in computer industry is
not a short period though it may be arguable. For the reason, this
commit removes the sentence.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
appendix/whymb/whymemorybarriers.tex | 2 --
1 file changed, 2 deletions(-)
diff --git a/appendix/whymb/whymemorybarriers.tex b/appendix/whymb/whymemorybarriers.tex
index 38ffad6..2eef059 100644
--- a/appendix/whymb/whymemorybarriers.tex
+++ b/appendix/whymb/whymemorybarriers.tex
@@ -1949,8 +1949,6 @@ SSE and 3DNOW instructions into account.
The ARM family of CPUs is extremely popular in embedded applications,
particularly for power-constrained applications such as cellphones.
-There have nevertheless been multiprocessor implementations of ARM
-for more than five years.
Its memory model is similar to that of Power
(see Section~\ref{sec:app:whymb:POWER / PowerPC}, but ARM uses a
different set of memory-barrier instructions~\cite{ARMv7A:2010}:
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] polish `Why Memory Barriers?` appendix
2016-03-20 0:24 [PATCH 0/3] polish `Why Memory Barriers?` appendix SeongJae Park
` (2 preceding siblings ...)
2016-03-20 0:24 ` [PATCH 3/3] whymb: remove ARM's short multi processor history description SeongJae Park
@ 2016-03-20 15:53 ` Paul E. McKenney
3 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2016-03-20 15:53 UTC (permalink / raw)
To: SeongJae Park; +Cc: perfbook
On Sun, Mar 20, 2016 at 09:24:18AM +0900, SeongJae Park wrote:
> This patchset polish `Why Memory Barriers?` appendix by fixing trivial nitpicks
> that found while translating it. It fix trivial and evident typos, make terms
> to be used consistently, and finally proposes removal of a sentence that looks
> outdated.
>
> SeongJae Park (3):
> whymb: fix trivial typos
> whymb: s/write buffer/store buffer
> whymb: remove ARM's short multi processor history description
Good eyes, queued and pushed!
Thanx, Paul
> appendix/whymb/whymemorybarriers.tex | 92 ++++++++++++++++++------------------
> 1 file changed, 45 insertions(+), 47 deletions(-)
>
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-03-20 15:53 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-20 0:24 [PATCH 0/3] polish `Why Memory Barriers?` appendix SeongJae Park
2016-03-20 0:24 ` [PATCH 1/3] whymb: fix trivial typos SeongJae Park
2016-03-20 0:24 ` [PATCH 2/3] whymb: s/write buffer/store buffer SeongJae Park
2016-03-20 0:24 ` [PATCH 3/3] whymb: remove ARM's short multi processor history description SeongJae Park
2016-03-20 15:53 ` [PATCH 0/3] polish `Why Memory Barriers?` appendix Paul E. McKenney
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.