All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] Use 'Arm' as text trademark of Arm architecture
@ 2020-03-14 15:14 Akira Yokosawa
  2020-03-14 15:15 ` [PATCH 2/4] memorder: Use \dots for ellipsis Akira Yokosawa
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Akira Yokosawa @ 2020-03-14 15:14 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

From d90bfe96b76995a125c9ba0f8461d5f0acc138ec Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sat, 14 Mar 2020 00:06:33 +0900
Subject: [PATCH 1/4] Use 'Arm' as text trademark of Arm architecture

Substitute "Arm" for "ARM" to respect the decision of Arm Limited.
Instead of direct substitutions, define macros \ARM and \ARMv.
This should help us easily catch up in case Arm changes its mind.

Note that as far as an argument of a macro is a single digit,
enclosing it by "{}" is not necessary.
For example, \ARMv{8} CPU" and "\ARMv8 CPU" will generate the same
result: "Armv8".

Some of "ARM" in ppcmem.tex are kept unchanged as the PPCMEM site
stilluses "ARM" as its interface choice.

Also update the legal page to mention trademarks of Arm, MIPS,
and SPARC.  Update the notice on Intel trademarks as well.
"x386" is not a trademark of Intel anymore.

While we are here, get rid of \mytexttrademark and \mytextregistered
as they have been empty ever since they were introduced in commit
eecdeac7367c ("Remove trademark and registered symbols in text").

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 SMPdesign/beyond.tex      |  6 ++--
 datastruct/datastruct.tex |  5 ++-
 formal/axiomatic.tex      |  2 +-
 formal/ppcmem.tex         | 22 ++++++-------
 future/formalregress.tex  |  2 +-
 legal.tex                 | 12 +++++--
 memorder/memorder.tex     | 66 +++++++++++++++++++--------------------
 perfbook.tex              |  6 ++--
 8 files changed, 61 insertions(+), 60 deletions(-)

diff --git a/SMPdesign/beyond.tex b/SMPdesign/beyond.tex
index 12e9237a..2eac3150 100644
--- a/SMPdesign/beyond.tex
+++ b/SMPdesign/beyond.tex
@@ -218,8 +218,7 @@ attempts to record cells in the \co{->visited[]} array.
 \end{fcvref}
 
 This approach does provide significant speedups on a dual-CPU
-Lenovo\mytexttrademark\ W500
-running at 2.53\,GHz, as shown in
+Lenovo W500 running at 2.53\,GHz, as shown in
 Figure~\ref{fig:SMPdesign:CDF of Solution Times For SEQ and PWQ},
 which shows the cumulative distribution functions (CDFs) for the solution
 times of the two algorithms, based on the solution of 500 different square
@@ -602,8 +601,7 @@ the solution line.
 This disappointing performance compared to results in
 Figure~\ref{fig:SMPdesign:Varying Maze Size vs. COPART}
 is due to the less-tightly integrated hardware available in the
-larger and older Xeon\mytextregistered\
-system running at 2.66\,GHz.
+larger and older Xeon system running at 2.66\,GHz.
 
 \subsection{Future Directions and Conclusions}
 \label{sec:SMPdesign:Future Directions and Conclusions}
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index e21952bc..ad9a80f2 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -319,9 +319,8 @@ The \co{hashtab_free()} function on
 \end{figure}
 
 The performance results for an eight-CPU 2\,GHz
-Intel\mytextregistered\
-Xeon\mytextregistered\
-system using a bucket-locked hash table with 1024 buckets are shown in
+Intel Xeon system using a bucket-locked hash table
+with 1024 buckets are shown in
 Figure~\ref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo}.
 The performance does scale nearly linearly, but is not much more than half
 of the ideal performance level, even at only eight CPUs.
diff --git a/formal/axiomatic.tex b/formal/axiomatic.tex
index 10bb33e9..26e3f1ac 100644
--- a/formal/axiomatic.tex
+++ b/formal/axiomatic.tex
@@ -419,7 +419,7 @@ in the \co{herd} output.
 	\begin{fcvref}[ln:formal:C-RomanPenyaev-list-rcu-rr:whole]
 	That is an excellent question.
 	As of late 2018, the answer is ``no one knows''.
-	Much depends on the semantics of ARMv8's conditional-move
+	Much depends on the semantics of \ARMv8's conditional-move
 	instruction.
 	While awaiting clarity on these semantics, \co{smp_store_release()}
 	is the safe choice.
diff --git a/formal/ppcmem.tex b/formal/ppcmem.tex
index 184cdd0d..d88e16c2 100644
--- a/formal/ppcmem.tex
+++ b/formal/ppcmem.tex
@@ -29,9 +29,9 @@ Peter Sewell and Susmit Sarkar at the University of Cambridge, Luc
 Maranget, Francesco Zappa Nardelli, and Pankaj Pawan at INRIA, and Jade
 Alglave at Oxford University, in cooperation with Derek Williams of
 IBM~\cite{JadeAlglave2011ppcmem}.
-This group formalized the memory models of Power, ARM, x86, as well
+This group formalized the memory models of Power, \ARM, x86, as well
 as that of the C/C++11 standard~\cite{PeteBecker2011N3242}, and
-produced the PPCMEM tool based on the Power and ARM formalizations.
+produced the PPCMEM tool based on the Power and \ARM\ formalizations.
 
 \QuickQuiz{}
 	But x86 has strong memory ordering!  Why would you need to
@@ -60,7 +60,7 @@ discusses the implications.
 
 An example PowerPC litmus test for PPCMEM is shown in
 \cref{lst:formal:PPCMEM Litmus Test}.
-The ARM interface works exactly the same way, but with ARM instructions
+The ARM interface works exactly the same way, but with \ARM\ instructions
 substituted for the Power instructions and with the initial ``PPC''
 replaced by ``ARM''. You can select the ARM interface by clicking on
 ``Change to ARM Model'' at the web page called out above.
@@ -164,7 +164,7 @@ runs tests on actual hardware. Perhaps more importantly, a large number of
 pre-existing litmus tests are available with the online tool (available
 via the ``Select ARM Test'' and ``Select POWER Test'' buttons). It is
 quite likely that one of these pre-existing litmus tests will answer
-your Power or ARM memory-ordering question.
+your Power or \ARM\ memory-ordering question.
 
 \subsection{What Does This Litmus Test Mean?}
 \label{sec:formal:What Does This Litmus Test Mean?}
@@ -173,8 +173,8 @@ P0's \clnref{reginit,stw} are equivalent to the C statement \co{x=1}
 because \clnref{init:0} defines P0's register \co{r2} to be the address
 of \co{x}. P0's \clnref{P0lwarx,P0stwcx} are the mnemonics for
 load-linked (``load register
-exclusive'' in ARM parlance and ``load reserve'' in Power parlance)
-and store-conditional (``store register exclusive'' in ARM parlance),
+exclusive'' in \ARM\ parlance and ``load reserve'' in Power parlance)
+and store-conditional (``store register exclusive'' in \ARM\ parlance),
 respectively. When these are used together, they form an atomic
 instruction sequence, roughly similar to the compare-and-swap sequences
 exemplified by the x86 \co{lock;cmpxchg} instruction. Moving to a higher
@@ -319,9 +319,9 @@ Therefore, the model predicts that the offending execution sequence
 cannot happen.
 
 \QuickQuiz{}
-	Does the ARM Linux kernel have a similar bug?
+	Does the \ARM\ Linux kernel have a similar bug?
 \QuickQuizAnswer{
-	ARM does not have this particular bug because that it places
+	\ARM\ does not have this particular bug because it places
 	\co{smp_mb()} before and after the \co{atomic_add_return()}
 	function's assembly-language implementation.
 	PowerPC no longer has this bug; it has long since been
@@ -363,12 +363,12 @@ cannot happen.
 \label{sec:formal:PPCMEM Discussion}
 
 These tools promise to be of great help to people working on low-level
-parallel primitives that run on ARM and on Power. These tools do have
+parallel primitives that run on \ARM\ and on Power. These tools do have
 some intrinsic limitations:
 
 \begin{enumerate}
 \item	These tools are research prototypes, and as such are unsupported.
-\item	These tools do not constitute official statements by IBM or ARM
+\item	These tools do not constitute official statements by IBM or \ARM\
 	on their respective CPU architectures. For example, both
 	corporations reserve the right to report a bug at any time against
 	any version of any of these tools. These tools are therefore not a
@@ -383,7 +383,7 @@ some intrinsic limitations:
 	may vary. In particular, the tool handles only word-sized accesses
 	(32 bits), and the words accessed must be properly aligned. In
 	addition, the tool does not handle some of the weaker variants
-	of the ARM memory-barrier instructions, nor does it handle arithmetic.
+	of the \ARM\ memory-barrier instructions, nor does it handle arithmetic.
 \item	The tools are restricted to small loop-free code fragments
 	running on small numbers of threads. Larger examples result
 	in state-space explosion, just as with similar tools such as
diff --git a/future/formalregress.tex b/future/formalregress.tex
index 9bfcdd2d..1677c722 100644
--- a/future/formalregress.tex
+++ b/future/formalregress.tex
@@ -125,7 +125,7 @@ good match for modern computer systems, as was seen in
 \cref{chp:Advanced Synchronization: Memory Ordering}.
 In contrast, one of the great strengths of PPCMEM and \co{herd}
 is their detailed modeling of various CPU families memory models,
-including x86, ARM, Power, and, in the case of \co{herd},
+including x86, \ARM, Power, and, in the case of \co{herd},
 even a Linux-kernel memory model~\cite{Alglave:2018:FSC:3173162.3177156},
 which has been accepted into version 4.17 of
 the Linux kernel.
diff --git a/legal.tex b/legal.tex
index 21b9263f..d4648c7f 100644
--- a/legal.tex
+++ b/legal.tex
@@ -13,8 +13,16 @@ Trademarks:
 	of International Business Machines Corporation in the United
 	States, other countries, or both.
 \item	Linux is a registered trademark of Linus Torvalds.
-\item	i386 is a trademark of Intel Corporation or its subsidiaries
-	in the United States, other countries, or both.
+\item	Intel, Itanium, Intel Core, and Intel Xeon are trademarks
+	of Intel Corporation or its subsidiaries in the United States,
+	other countries, or both.
+\item	Arm is a registered trademark of Arm Limited (or its subsidiaries)
+	in the US and/or elsewhere.
+\item	MIPS is a registered trademark of Wave, Inc. in the United States
+	and other countries.
+\item	SPARC is a registered trademark of SPARC International, Inc.
+	Products bearing SPARC trademarks are based on an architecture
+	developed by Sun Microsystems, Inc.
 \item	Other company, product, and service names may be trademarks or
 	service marks of such companies.
 \end{itemize}
diff --git a/memorder/memorder.tex b/memorder/memorder.tex
index 01355730..63f1474b 100644
--- a/memorder/memorder.tex
+++ b/memorder/memorder.tex
@@ -330,7 +330,7 @@ synchronization primitives (such as locking and RCU)
 that are responsible for maintaining the illusion of ordering through use of
 \emph{memory barriers} (for example, \co{smp_mb()} in the Linux kernel).
 These memory barriers can be explicit instructions, as they are on
-ARM, \Power{}, Itanium, and Alpha, or they can be implied by other instructions,
+\ARM, \Power{}, Itanium, and Alpha, or they can be implied by other instructions,
 as they often are on x86.
 Since these standard synchronization primitives preserve the illusion of
 ordering, your path of least resistance is to simply use these primitives,
@@ -1332,7 +1332,7 @@ in pre-v4.15 Linux kernels.
 	To sum up, current platforms either respect address dependencies
 	implicitly, as is the case for TSO platforms (x86, mainframe,
 	SPARC,~...), have hardware tracking for address dependencies
-	(ARM, PowerPC, MIPS,~...), have the required memory barriers
+	(\ARM, PowerPC, MIPS,~...), have the required memory barriers
 	supplied by \co{READ_ONCE()} (DEC Alpha in Linux kernel v4.15 and
 	later), or require the memory barriers supplied by
 	\co{rcu_dereference()} (DEC Alpha in Linux kernel v4.14 and earlier).
@@ -1582,7 +1582,7 @@ instead provided the slightly weaker
 \emph{other-multicopy atomicity}~\cite[Section B2.3]{ARMv8A:2017},
 which excludes the CPU doing a given store from the requirement that all
 CPUs agree on the order of all stores.\footnote{
-	As of late 2018, ARMv8 and x86 provide other-multicopy atomicity,
+	As of late 2018, \ARMv8 and x86 provide other-multicopy atomicity,
 	IBM mainframe provides fully multicopy atomicity, and PPC does
 	not provide multicopy atomicity at all. More detail is shown in
 	\cref{tab:memorder:Summary of Memory Ordering}.}
@@ -2071,7 +2071,7 @@ This should not come as a surprise to anyone who carefully examined
 	\cref{lst:memorder:2+2W Litmus Test With Write Barriers}
 	(\path{C-2+2W+o-wmb-o+o-wmb-o.litmus}),
 	research shows that the cycle is prohibited, even in weakly
-	ordered systems such as ARM and Power~\cite{test6-pdf}.
+	ordered systems such as \ARM\ and Power~\cite{test6-pdf}.
 	Given that, are store-to-store really \emph{always}
 	counter-temporal???
 \QuickQuizAnswer{
@@ -3720,8 +3720,8 @@ to what can be done based on individual memory accesses.
 	\cmidrule{3-11}
 	\multicolumn{2}{c}{\raisebox{.5ex}{Property}}
 	& \cpufml{Alpha}
-	& \cpufml{ARMv7-A/R}
-	& \cpufml{ARMv8}
+	& \cpufml{\ARMv7-A/R}
+	& \cpufml{\ARMv8}
 	& \cpufml{Itanium}
 	& \cpufml{MIPS}
 	& \cpufml{\Power{}}
@@ -4041,7 +4041,7 @@ in Alpha's heyday.
 One could place an \co{smp_rmb()} primitive
 between the pointer fetch and dereference in order to force Alpha
 to order the pointer fetch with the later dependent load.
-However, this imposes unneeded overhead on systems (such as ARM,
+However, this imposes unneeded overhead on systems (such as \ARM,
 Itanium, PPC, and SPARC) that respect data dependencies on the read side.
 A \co{smp_read_barrier_depends()} primitive has therefore been added to the
 Linux kernel to eliminate overhead on these systems, and was also added
@@ -4150,13 +4150,13 @@ an \co{smp_mb()} rather than a no-op.
 
 For more on Alpha, see its reference manual~\cite{ALPHA2002}.
 
-\subsection{ARMv7-A/R}
+\subsection{\ARMv7-A/R}
 \label{sec:memorder:ARMv7-A/R}
 
-The ARM family of CPUs is extremely popular in embedded applications,
+The \ARM\ family of CPUs is extremely popular in embedded applications,
 particularly for power-constrained applications such as cellphones.
 Its memory model is similar to that of \Power{}
-(see \cref{sec:memorder:POWER / PowerPC}), but ARM uses a
+(see \cref{sec:memorder:POWER / PowerPC}), but \ARM\ uses a
 different set of memory-barrier instructions~\cite{ARMv7A:2010}:
 
 \begin{description}
@@ -4166,7 +4166,7 @@ different set of memory-barrier instructions~\cite{ARMv7A:2010}:
 	The ``type'' of operations can be all operations or can be
 	restricted to only writes (similar to the Alpha \co{wmb}
 	and the \Power{} \co{eieio} instructions).
-	In addition, ARM allows cache coherence to have one of three
+	In addition, \ARM\ allows cache coherence to have one of three
 	scopes: single processor, a subset of the processors
 	(``inner'') and global (``outer'').
 \item	[\tco{DSB}] (data synchronization barrier) causes the specified
@@ -4175,7 +4175,7 @@ different set of memory-barrier instructions~\cite{ARMv7A:2010}:
 	The ``type'' of operations is the same as that of \co{DMB}.
 	The \co{DSB} instruction was called \co{DWB} (drain write buffer
 	or data write barrier, your choice) in early versions of the
-	ARM architecture.
+	\ARM\ architecture.
 \item	[\tco{ISB}] (instruction synchronization barrier) flushes the CPU
 	pipeline, so that all instructions following the \co{ISB}
 	are fetched only after the \co{ISB} completes.
@@ -4194,7 +4194,7 @@ stronger than
 \cref{sec:memorder:Cumulativity}'s
 variant of cumulativity.
 
-ARM also implements control dependencies, so that if a conditional
+\ARM\ also implements control dependencies, so that if a conditional
 branch depends on a load, then any store executed after that conditional
 branch will be ordered after the load.
 However, loads following the conditional branch will \emph{not}
@@ -4218,7 +4218,7 @@ r3 = z;			\lnlbl[z2]
 In this example, load-store control dependency ordering causes
 the load from \co{x} on \clnref{x} to be ordered before the store to
 \co{y} on \clnref{y}.
-However, ARM does not respect load-load control dependencies, so that
+However, \ARM\ does not respect load-load control dependencies, so that
 the load on \clnref{x} might well happen \emph{after} the
 load on \clnref{z1}.
 On the other hand, the combination of the conditional branch on \clnref{if}
@@ -4228,7 +4228,7 @@ Note that inserting an additional \co{ISB} instruction somewhere between
 \clnref{nop,y} would enforce ordering between \clnref{x,z1}.
 \end{fcvref}
 
-\subsection{ARMv8}
+\subsection{\ARMv8}
 
 \begin{figure}[tb]
 \centering
@@ -4237,29 +4237,29 @@ Note that inserting an additional \co{ISB} instruction somewhere between
 \ContributedBy{Figure}{fig:memorder:Half Memory Barrier}{Melissa Brossard}
 \end{figure}
 
-ARMv8 is ARM's new CPU family~\cite{ARMv8A:2017}
+\ARMv8 is \ARM's new CPU family~\cite{ARMv8A:2017}
 which includes 64-bit capabilities,
 in contrast to their 32-bit-only CPU described in
 \cref{sec:memorder:ARMv7-A/R}.
-ARMv8's memory model closely resembles its ARMv7 counterpart,
+\ARMv8's memory model closely resembles its \ARMv7 counterpart,
 but adds load-acquire (\co{LDLARB}, \co{LDLARH}, and \co{LDLAR})
 and store-release (\co{STLLRB}, \co{STLLRH}, and \co{STLLR})
 instructions.
 These instructions act as ``half memory barriers'', so that
-ARMv8 CPUs can reorder previous accesses with a later \co{LDLAR}
+\ARMv8 CPUs can reorder previous accesses with a later \co{LDLAR}
 instruction, but are prohibited from reordering an earlier \co{LDLAR}
 instruction with later accesses, as fancifully depicted in
 \cref{fig:memorder:Half Memory Barrier}.
-Similarly, ARMv8 CPUs can reorder an earlier \co{STLLR} instruction with
+Similarly, \ARMv8 CPUs can reorder an earlier \co{STLLR} instruction with
 a subsequent access, but are prohibited from reordering
 previous accesses with a later \co{STLLR} instruction.
 As one might expect, this means that these instructions directly support
 the C11 notion of load-acquire and store-release.
 
-However, ARMv8 goes well beyond the C11 memory model by mandating that
+However, \ARMv8 goes well beyond the C11 memory model by mandating that
 the combination of a store-release and load-acquire act as a full
 barrier under many circumstances.
-For example, in ARMv8, given a store followed by a store-release followed
+For example, in \ARMv8, given a store followed by a store-release followed
 a load-acquire followed by a load, all to different variables and all from
 a single CPU, all CPUs
 would agree that the initial store preceded the final load.
@@ -4267,12 +4267,12 @@ Interestingly enough, most TSO architectures (including x86 and the
 mainframe) do not make this guarantee, as the two loads could be
 reordered before the two stores.
 
-ARMv8 is one of only two architectures that needs the
+\ARMv8 is one of only two architectures that needs the
 \co{smp_mb__after_spinlock()} primitive to be a full barrier,
 due to its relatively weak lock-acquisition implementation in
 the Linux kernel.
 
-ARMv8 also has the distinction of being the first CPU whose vendor publicly
+\ARMv8 also has the distinction of being the first CPU whose vendor publicly
 defined its memory ordering with an executable formal model~\cite{ARMv8A:2017}.
 
 \subsection{Itanium}
@@ -4287,7 +4287,7 @@ instructions~\cite{IntelItanium02v3}.
 The {\tt acq} modifier prevents subsequent memory-reference instructions
 from being reordered before the {\tt acq}, but permits
 prior memory-reference instructions to be reordered after the {\tt acq},
-similar to the ARMv8 load-acquire instructions.
+similar to the \ARMv8 load-acquire instructions.
 Similarly, the {\tt rel} modifier prevents prior memory-reference
 instructions from being reordered after the {\tt rel}, but allows
 subsequent memory-reference instructions to be reordered before
@@ -4329,12 +4329,12 @@ CPU, including those to the same variable.
 \subsection{MIPS}
 
 The MIPS memory model~\cite[page~479]{MIPSvII-A-2017}
-appears to resemble that of ARM, Itanium, and \Power{},
+appears to resemble that of \ARM, Itanium, and \Power{},
 being weakly ordered by default, but respecting dependencies.
 MIPS has a wide variety of memory-barrier instructions, but ties them
 not to hardware considerations, but rather to the use cases provided
 by the Linux kernel and the C++11 standard~\cite{RichardSmith2015N4527}
-in a manner similar to the ARMv8 additions:
+in a manner similar to the \ARMv8 additions:
 
 \begin{description}[style=nextline]
 \item[\tco{SYNC}]
@@ -4377,7 +4377,7 @@ in a manner similar to the ARMv8 additions:
 
 Informal discussions with MIPS architects indicates that MIPS has a
 definition of transitivity or cumulativity similar to that of
-ARM and \Power{}.
+\ARM\ and \Power{}.
 However, it appears that different MIPS implementations can have
 different memory-ordering properties, so it is important to consult
 the documentation for the specific MIPS implementation you are using.
@@ -4385,8 +4385,7 @@ the documentation for the specific MIPS implementation you are using.
 \subsection{\Power{} / PowerPC}
 \label{sec:memorder:POWER / PowerPC}
 
-The \Power{} and PowerPC\mytextregistered\
-CPU families have a wide variety of memory-barrier
+The \Power{} and PowerPC CPU families have a wide variety of memory-barrier
 instructions~\cite{PowerPC94,MichaelLyons05a}:
 \begin{description}
 \item	[\tco{sync}] causes all preceding operations to {\em appear to have}
@@ -4448,11 +4447,11 @@ fragment itself saw.
 Much more detail is available from
 McKenney and Silvera~\cite{PaulEMcKenneyN2745r2009}.
 
-\Power{} respects control dependencies in much the same way that ARM
+\Power{} respects control dependencies in much the same way that \ARM\
 does, with the exception that the \Power{} \co{isync} instruction
-is substituted for the ARM \co{ISB} instruction.
+is substituted for the \ARM\ \co{ISB} instruction.
 
-Like ARMv8, \Power{} requires \co{smp_mb__after_spinlock()} to be
+Like \ARMv8, \Power{} requires \co{smp_mb__after_spinlock()} to be
 a full memory barrier.
 In addition, \Power{} is the only architecture requiring
 \co{smp_mb__after_unlock_lock()} to be a full memory barrier.
@@ -4595,8 +4594,7 @@ it~\cite[Section 8.1.3]{Intel64IA32v3A2011}.
 
 \subsection{z Systems}
 
-The z~Systems machines make up the IBM\mytexttrademark\
-mainframe family, previously
+The z~Systems machines make up the IBM mainframe family, previously
 known as the 360, 370, 390 and zSeries~\cite{IBMzSeries04a}.
 Parallelism came late to z~Systems, but given that these mainframes first
 shipped in the mid 1960s, this is not saying much.
diff --git a/perfbook.tex b/perfbook.tex
index a5ac180b..f5b57058 100644
--- a/perfbook.tex
+++ b/perfbook.tex
@@ -269,16 +269,14 @@
 \DeclareRobustCommand{\euler}{\ensuremath{\mathrm{e}}}
 \DeclareRobustCommand{\O}[1]{\ensuremath{\mathcal{O}\left(#1\right)}}
 \newcommand{\Power}[1]{POWER#1}
+\newcommand{\ARM}[1]{Arm{#1}}
+\newcommand{\ARMv}[1]{Armv{#1}}
 \newcommand{\GNUC}{GNU~C}
 \newcommand{\GCC}{GCC}
 %\newcommand{\GCC}{\co{gcc}} % For those who prefer "gcc"
 \newcommand{\IRQ}{IRQ}
 %\newcommand{\IRQ}{irq}      % For those who prefer "irq"
 \newcommand{\rt}{\mbox{-rt}} % to prevent line break behind "-"
-\newcommand{\mytexttrademark}{}
-\newcommand{\mytextregistered}{}
-%\newcommand{\mytexttrademark}{\textsuperscript\texttrademark}
-%\newcommand{\mytextregistered}{\textsuperscript\textregistered}
 
 \newcommand{\Epigraph}[2]{\epigraphhead[65]{\epigraph{#1}{#2}}}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] memorder: Use \dots for ellipsis
  2020-03-14 15:14 [PATCH 1/4] Use 'Arm' as text trademark of Arm architecture Akira Yokosawa
@ 2020-03-14 15:15 ` Akira Yokosawa
  2020-03-14 15:16 ` [PATCH 3/4] Remove '(R)' and '(TM)' Akira Yokosawa
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Akira Yokosawa @ 2020-03-14 15:15 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

From e0dd23a84a67a2eee322e3d53ccfb5c2d362fd8c Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sat, 14 Mar 2020 19:08:32 +0900
Subject: [PATCH 2/4] memorder: Use \dots for ellipsis

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 memorder/memorder.tex | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/memorder/memorder.tex b/memorder/memorder.tex
index 63f1474b..5a9301d0 100644
--- a/memorder/memorder.tex
+++ b/memorder/memorder.tex
@@ -1331,8 +1331,8 @@ in pre-v4.15 Linux kernels.
 
 	To sum up, current platforms either respect address dependencies
 	implicitly, as is the case for TSO platforms (x86, mainframe,
-	SPARC,~...), have hardware tracking for address dependencies
-	(\ARM, PowerPC, MIPS,~...), have the required memory barriers
+	SPARC,~\dots), have hardware tracking for address dependencies
+	(\ARM, PowerPC, MIPS,~\dots), have the required memory barriers
 	supplied by \co{READ_ONCE()} (DEC Alpha in Linux kernel v4.15 and
 	later), or require the memory barriers supplied by
 	\co{rcu_dereference()} (DEC Alpha in Linux kernel v4.14 and earlier).
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] Remove '(R)' and '(TM)'
  2020-03-14 15:14 [PATCH 1/4] Use 'Arm' as text trademark of Arm architecture Akira Yokosawa
  2020-03-14 15:15 ` [PATCH 2/4] memorder: Use \dots for ellipsis Akira Yokosawa
@ 2020-03-14 15:16 ` Akira Yokosawa
  2020-03-14 15:17 ` [PATCH 4/4] formal/axiomatic: Move table env next to QQA's first paragraph Akira Yokosawa
  2020-03-14 23:07 ` [PATCH 1/4] Use 'Arm' as text trademark of Arm architecture Paul E. McKenney
  3 siblings, 0 replies; 5+ messages in thread
From: Akira Yokosawa @ 2020-03-14 15:16 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

From b887d5dd9d9c839c74c34f7e7d146ef14b233afb Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sat, 14 Mar 2020 19:08:53 +0900
Subject: [PATCH 3/4] Remove '(R)' and '(TM)'

These were copy'n pasted from /proc/cpuinfo.
They are not supposed to be required in this type of textbook.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 appendix/styleguide/styleguide.tex |  8 ++++----
 cpu/overheads.tex                  | 22 +++++++++++-----------
 cpu/swdesign.tex                   |  6 +++---
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/appendix/styleguide/styleguide.tex b/appendix/styleguide/styleguide.tex
index 00fa249d..b6b343e0 100644
--- a/appendix/styleguide/styleguide.tex
+++ b/appendix/styleguide/styleguide.tex
@@ -1333,14 +1333,14 @@ as a reference to be consulted when new tables are added in the text.
 	Global Comms		& 195 000 000	& 409 500 000   \\
 	\bottomrule
 \end{tabular}
-\caption{CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel(R) Xeon(R) Platinum 8176 CPUs @ 2.10GHz}
-\label{tab:app:styleguide:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel(R) Xeon(R) Platinum 8176 CPUs @ 2.10GHz}
+\caption{CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs @ 2.10GHz}
+\label{tab:app:styleguide:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs @ 2.10GHz}
 \end{table}
 
 In
-\cref{tab:app:styleguide:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel(R) Xeon(R) Platinum 8176 CPUs @ 2.10GHz}
+\cref{tab:app:styleguide:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs @ 2.10GHz}
 (corresponding to
-\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel(R) Xeon(R) Platinum 8176 CPUs at 2.10GHz}),
+\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}),
 the ``S'' column specifiers provided
 by the ``siunitx'' package are used to align numbers.
 
diff --git a/cpu/overheads.tex b/cpu/overheads.tex
index e3fa42bf..2bac760f 100644
--- a/cpu/overheads.tex
+++ b/cpu/overheads.tex
@@ -161,13 +161,13 @@ optimization.
 	Global Comms	& 195 000 000 & 409 500 000 & \\
 	\bottomrule
 \end{tabular}
-\caption{CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel(R) Xeon(R) Platinum 8176 CPUs @ 2.10\,GHz}
-\label{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel(R) Xeon(R) Platinum 8176 CPUs at 2.10GHz}
+\caption{CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs @ 2.10\,GHz}
+\label{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
 \end{table*}
 
 The overheads of some common operations important to parallel programs are
 displayed in
-Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel(R) Xeon(R) Platinum 8176 CPUs at 2.10GHz}.
+Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}.
 This system's clock period rounds to 0.5\,ns.
 Although it is not unusual for modern microprocessors to be able to
 retire multiple instructions per clock period, the operations' costs are
@@ -234,7 +234,7 @@ That said, the overhead of these operations are similar to single-CPU
 CAS and lock, respectively.
 
 \QuickQuiz{}
-	\Cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel(R) Xeon(R) Platinum 8176 CPUs at 2.10GHz}
+	\Cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
 	shows CPU~0 sharing a core with CPU~224.
 	Shouldn't that instead be CPU~1???
 \QuickQuizAnswer{
@@ -344,16 +344,16 @@ thousand clock cycles.
 	10\,GHz.
 
 	In addition,
-	Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel(R) Xeon(R) Platinum 8176 CPUs at 2.10GHz}
+	Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
 	on
-	page~\pageref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel(R) Xeon(R) Platinum 8176 CPUs at 2.10GHz}
+	page~\pageref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
 	represents a reasonably large system with no fewer 448~hardware
 	threads.
 	Smaller systems often achieve better latency, as may be seen in
 	Table~\ref{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System},
 	which represents a much smaller system with only 16 hardware threads.
 	A similar view is provided by the rows of
-	Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel(R) Xeon(R) Platinum 8176 CPUs at 2.10GHz}
+	Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
 	down to and including the two ``Off-core'' rows.
 
 \begin{table*}
@@ -385,19 +385,19 @@ thousand clock cycles.
 	Global Comms	& 195 000 000 & 429 000 000 & \\
 	\bottomrule
 \end{tabular}
-\caption{CPU 0 View of Synchronization Mechanisms on 12-CPU Intel(R) Core(TM) i7-8750H CPU @ 2.20\,GHz}
-\label{tab:cpu:CPU 0 View of Synchronization Mechanisms on 12-CPU Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz}
+\caption{CPU 0 View of Synchronization Mechanisms on 12-CPU Intel Core i7-8750H CPU @ 2.20\,GHz}
+\label{tab:cpu:CPU 0 View of Synchronization Mechanisms on 12-CPU Intel Core i7-8750H CPU @ 2.20GHz}
 \end{table*}
 
 	Furthermore, newer small-scale single-socket systems such
 	as the laptop on which I am typing this also have more
 	reasonable latencies, as can be seen in
-	\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 12-CPU Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz}.
+	\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 12-CPU Intel Core i7-8750H CPU @ 2.20GHz}.
 
 	Alternatively, a 64-CPU system in the mid 1990s had
 	cross-interconnect latencies in excess of five microseconds,
 	so even the eight-socket 448-hardware-thread monster shown in
-	Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel(R) Xeon(R) Platinum 8176 CPUs at 2.10GHz}
+	Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
 	represents more than a five-fold improvement over its
 	25-years-prior counterparts.
 
diff --git a/cpu/swdesign.tex b/cpu/swdesign.tex
index 2a9daa95..c885c3cb 100644
--- a/cpu/swdesign.tex
+++ b/cpu/swdesign.tex
@@ -12,7 +12,7 @@
 	 {\emph{Ella Wheeler Wilcox}}
 
 The values of the ratios in
-Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel(R) Xeon(R) Platinum 8176 CPUs at 2.10GHz}
+Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
 are critically important, as they limit the
 efficiency of a given parallel application.
 To see this, suppose that the parallel application uses CAS
@@ -50,9 +50,9 @@ be extremely infrequent and to enable very large quantities of processing.
 	\item	Large shared-memory systems tend to have much longer
 		cache-miss latencies than do smaller system.
 		To see this, compare
-		\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel(R) Xeon(R) Platinum 8176 CPUs at 2.10GHz}
+		\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
 		with
-		\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 12-CPU Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz}.
+		\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 12-CPU Intel Core i7-8750H CPU @ 2.20GHz}.
 	\item	The distributed-systems communications latencies do
 		not necessarily consume the CPU, which can often allow
 		computation to proceed in parallel with message transfer.
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] formal/axiomatic: Move table env next to QQA's first paragraph
  2020-03-14 15:14 [PATCH 1/4] Use 'Arm' as text trademark of Arm architecture Akira Yokosawa
  2020-03-14 15:15 ` [PATCH 2/4] memorder: Use \dots for ellipsis Akira Yokosawa
  2020-03-14 15:16 ` [PATCH 3/4] Remove '(R)' and '(TM)' Akira Yokosawa
@ 2020-03-14 15:17 ` Akira Yokosawa
  2020-03-14 23:07 ` [PATCH 1/4] Use 'Arm' as text trademark of Arm architecture Paul E. McKenney
  3 siblings, 0 replies; 5+ messages in thread
From: Akira Yokosawa @ 2020-03-14 15:17 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

From 530d73392e389c1775946078391e9f017a4d9dad Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sat, 14 Mar 2020 19:33:42 +0900
Subject: [PATCH 4/4] formal/axiomatic: Move table env next to QQA's first paragraph

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 formal/axiomatic.tex | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/formal/axiomatic.tex b/formal/axiomatic.tex
index 26e3f1ac..1565f413 100644
--- a/formal/axiomatic.tex
+++ b/formal/axiomatic.tex
@@ -188,6 +188,22 @@ And in this case, the \co{herd} tool's output features the string
 	Why bother modeling locking directly?
 	Why not simply emulate locking with atomic operations?
 \QuickQuizAnswer{
+	In a word, performance, as can be seen in
+	Table~\ref{tab:formal:Locking: Modeling vs. Emulation Time (s)}.
+	The first column shows the number of herd processes modeled.
+	The second column shows the herd runtime when modeling
+	\co{spin_lock()} and \co{spin_unlock()} directly in herd's
+	cat language.
+	The third column shows the herd runtime when emulating
+	\co{spin_lock()} with \co{cmpxchg_acquire()} and \co{spin_unlock()}
+	with \co{smp_store_release()}, using the herd \co{filter}
+	clause to reject executions that fail to acquire the lock.
+	The fourth column is like the third, but using \co{xchg_acquire()}
+	instead of \co{cmpxchg_acquire()}.
+	The fifth and sixth columns are like the third and fourth,
+	but instead using the herd \co{exists} clause to reject
+	executions that fail to acquire the lock.
+
 \begin{table}[tb]
 \rowcolors{2}{}{lightgray}
 \small
@@ -210,21 +226,6 @@ And in this case, the \co{herd} tool's output features the string
 \caption{Locking: Modeling vs. Emulation Time (s)}
 \label{tab:formal:Locking: Modeling vs. Emulation Time (s)}
 \end{table}
-	In a word, performance, as can be seen in
-	Table~\ref{tab:formal:Locking: Modeling vs. Emulation Time (s)}.
-	The first column shows the number of herd processes modeled.
-	The second column shows the herd runtime when modeling
-	\co{spin_lock()} and \co{spin_unlock()} directly in herd's
-	cat language.
-	The third column shows the herd runtime when emulating
-	\co{spin_lock()} with \co{cmpxchg_acquire()} and \co{spin_unlock()}
-	with \co{smp_store_release()}, using the herd \co{filter}
-	clause to reject executions that fail to acquire the lock.
-	The fourth column is like the third, but using \co{xchg_acquire()}
-	instead of \co{cmpxchg_acquire()}.
-	The fifth and sixth columns are like the third and fourth,
-	but instead using the herd \co{exists} clause to reject
-	executions that fail to acquire the lock.
 
 	Note also that use of the \co{filter} clause is about twice
 	as fast as is use of the \co{exists} clause.
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/4] Use 'Arm' as text trademark of Arm architecture
  2020-03-14 15:14 [PATCH 1/4] Use 'Arm' as text trademark of Arm architecture Akira Yokosawa
                   ` (2 preceding siblings ...)
  2020-03-14 15:17 ` [PATCH 4/4] formal/axiomatic: Move table env next to QQA's first paragraph Akira Yokosawa
@ 2020-03-14 23:07 ` Paul E. McKenney
  3 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2020-03-14 23:07 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Sun, Mar 15, 2020 at 12:14:01AM +0900, Akira Yokosawa wrote:
> >From d90bfe96b76995a125c9ba0f8461d5f0acc138ec Mon Sep 17 00:00:00 2001
> From: Akira Yokosawa <akiyks@gmail.com>
> Date: Sat, 14 Mar 2020 00:06:33 +0900
> Subject: [PATCH 1/4] Use 'Arm' as text trademark of Arm architecture
> 
> Substitute "Arm" for "ARM" to respect the decision of Arm Limited.
> Instead of direct substitutions, define macros \ARM and \ARMv.
> This should help us easily catch up in case Arm changes its mind.
> 
> Note that as far as an argument of a macro is a single digit,
> enclosing it by "{}" is not necessary.
> For example, \ARMv{8} CPU" and "\ARMv8 CPU" will generate the same
> result: "Armv8".
> 
> Some of "ARM" in ppcmem.tex are kept unchanged as the PPCMEM site
> stilluses "ARM" as its interface choice.
> 
> Also update the legal page to mention trademarks of Arm, MIPS,
> and SPARC.  Update the notice on Intel trademarks as well.
> "x386" is not a trademark of Intel anymore.
> 
> While we are here, get rid of \mytexttrademark and \mytextregistered
> as they have been empty ever since they were introduced in commit
> eecdeac7367c ("Remove trademark and registered symbols in text").
> 
> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

Queued and pushed all four, thank you!

							Thanx, Paul

> ---
>  SMPdesign/beyond.tex      |  6 ++--
>  datastruct/datastruct.tex |  5 ++-
>  formal/axiomatic.tex      |  2 +-
>  formal/ppcmem.tex         | 22 ++++++-------
>  future/formalregress.tex  |  2 +-
>  legal.tex                 | 12 +++++--
>  memorder/memorder.tex     | 66 +++++++++++++++++++--------------------
>  perfbook.tex              |  6 ++--
>  8 files changed, 61 insertions(+), 60 deletions(-)
> 
> diff --git a/SMPdesign/beyond.tex b/SMPdesign/beyond.tex
> index 12e9237a..2eac3150 100644
> --- a/SMPdesign/beyond.tex
> +++ b/SMPdesign/beyond.tex
> @@ -218,8 +218,7 @@ attempts to record cells in the \co{->visited[]} array.
>  \end{fcvref}
>  
>  This approach does provide significant speedups on a dual-CPU
> -Lenovo\mytexttrademark\ W500
> -running at 2.53\,GHz, as shown in
> +Lenovo W500 running at 2.53\,GHz, as shown in
>  Figure~\ref{fig:SMPdesign:CDF of Solution Times For SEQ and PWQ},
>  which shows the cumulative distribution functions (CDFs) for the solution
>  times of the two algorithms, based on the solution of 500 different square
> @@ -602,8 +601,7 @@ the solution line.
>  This disappointing performance compared to results in
>  Figure~\ref{fig:SMPdesign:Varying Maze Size vs. COPART}
>  is due to the less-tightly integrated hardware available in the
> -larger and older Xeon\mytextregistered\
> -system running at 2.66\,GHz.
> +larger and older Xeon system running at 2.66\,GHz.
>  
>  \subsection{Future Directions and Conclusions}
>  \label{sec:SMPdesign:Future Directions and Conclusions}
> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> index e21952bc..ad9a80f2 100644
> --- a/datastruct/datastruct.tex
> +++ b/datastruct/datastruct.tex
> @@ -319,9 +319,8 @@ The \co{hashtab_free()} function on
>  \end{figure}
>  
>  The performance results for an eight-CPU 2\,GHz
> -Intel\mytextregistered\
> -Xeon\mytextregistered\
> -system using a bucket-locked hash table with 1024 buckets are shown in
> +Intel Xeon system using a bucket-locked hash table
> +with 1024 buckets are shown in
>  Figure~\ref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo}.
>  The performance does scale nearly linearly, but is not much more than half
>  of the ideal performance level, even at only eight CPUs.
> diff --git a/formal/axiomatic.tex b/formal/axiomatic.tex
> index 10bb33e9..26e3f1ac 100644
> --- a/formal/axiomatic.tex
> +++ b/formal/axiomatic.tex
> @@ -419,7 +419,7 @@ in the \co{herd} output.
>  	\begin{fcvref}[ln:formal:C-RomanPenyaev-list-rcu-rr:whole]
>  	That is an excellent question.
>  	As of late 2018, the answer is ``no one knows''.
> -	Much depends on the semantics of ARMv8's conditional-move
> +	Much depends on the semantics of \ARMv8's conditional-move
>  	instruction.
>  	While awaiting clarity on these semantics, \co{smp_store_release()}
>  	is the safe choice.
> diff --git a/formal/ppcmem.tex b/formal/ppcmem.tex
> index 184cdd0d..d88e16c2 100644
> --- a/formal/ppcmem.tex
> +++ b/formal/ppcmem.tex
> @@ -29,9 +29,9 @@ Peter Sewell and Susmit Sarkar at the University of Cambridge, Luc
>  Maranget, Francesco Zappa Nardelli, and Pankaj Pawan at INRIA, and Jade
>  Alglave at Oxford University, in cooperation with Derek Williams of
>  IBM~\cite{JadeAlglave2011ppcmem}.
> -This group formalized the memory models of Power, ARM, x86, as well
> +This group formalized the memory models of Power, \ARM, x86, as well
>  as that of the C/C++11 standard~\cite{PeteBecker2011N3242}, and
> -produced the PPCMEM tool based on the Power and ARM formalizations.
> +produced the PPCMEM tool based on the Power and \ARM\ formalizations.
>  
>  \QuickQuiz{}
>  	But x86 has strong memory ordering!  Why would you need to
> @@ -60,7 +60,7 @@ discusses the implications.
>  
>  An example PowerPC litmus test for PPCMEM is shown in
>  \cref{lst:formal:PPCMEM Litmus Test}.
> -The ARM interface works exactly the same way, but with ARM instructions
> +The ARM interface works exactly the same way, but with \ARM\ instructions
>  substituted for the Power instructions and with the initial ``PPC''
>  replaced by ``ARM''. You can select the ARM interface by clicking on
>  ``Change to ARM Model'' at the web page called out above.
> @@ -164,7 +164,7 @@ runs tests on actual hardware. Perhaps more importantly, a large number of
>  pre-existing litmus tests are available with the online tool (available
>  via the ``Select ARM Test'' and ``Select POWER Test'' buttons). It is
>  quite likely that one of these pre-existing litmus tests will answer
> -your Power or ARM memory-ordering question.
> +your Power or \ARM\ memory-ordering question.
>  
>  \subsection{What Does This Litmus Test Mean?}
>  \label{sec:formal:What Does This Litmus Test Mean?}
> @@ -173,8 +173,8 @@ P0's \clnref{reginit,stw} are equivalent to the C statement \co{x=1}
>  because \clnref{init:0} defines P0's register \co{r2} to be the address
>  of \co{x}. P0's \clnref{P0lwarx,P0stwcx} are the mnemonics for
>  load-linked (``load register
> -exclusive'' in ARM parlance and ``load reserve'' in Power parlance)
> -and store-conditional (``store register exclusive'' in ARM parlance),
> +exclusive'' in \ARM\ parlance and ``load reserve'' in Power parlance)
> +and store-conditional (``store register exclusive'' in \ARM\ parlance),
>  respectively. When these are used together, they form an atomic
>  instruction sequence, roughly similar to the compare-and-swap sequences
>  exemplified by the x86 \co{lock;cmpxchg} instruction. Moving to a higher
> @@ -319,9 +319,9 @@ Therefore, the model predicts that the offending execution sequence
>  cannot happen.
>  
>  \QuickQuiz{}
> -	Does the ARM Linux kernel have a similar bug?
> +	Does the \ARM\ Linux kernel have a similar bug?
>  \QuickQuizAnswer{
> -	ARM does not have this particular bug because that it places
> +	\ARM\ does not have this particular bug because it places
>  	\co{smp_mb()} before and after the \co{atomic_add_return()}
>  	function's assembly-language implementation.
>  	PowerPC no longer has this bug; it has long since been
> @@ -363,12 +363,12 @@ cannot happen.
>  \label{sec:formal:PPCMEM Discussion}
>  
>  These tools promise to be of great help to people working on low-level
> -parallel primitives that run on ARM and on Power. These tools do have
> +parallel primitives that run on \ARM\ and on Power. These tools do have
>  some intrinsic limitations:
>  
>  \begin{enumerate}
>  \item	These tools are research prototypes, and as such are unsupported.
> -\item	These tools do not constitute official statements by IBM or ARM
> +\item	These tools do not constitute official statements by IBM or \ARM\
>  	on their respective CPU architectures. For example, both
>  	corporations reserve the right to report a bug at any time against
>  	any version of any of these tools. These tools are therefore not a
> @@ -383,7 +383,7 @@ some intrinsic limitations:
>  	may vary. In particular, the tool handles only word-sized accesses
>  	(32 bits), and the words accessed must be properly aligned. In
>  	addition, the tool does not handle some of the weaker variants
> -	of the ARM memory-barrier instructions, nor does it handle arithmetic.
> +	of the \ARM\ memory-barrier instructions, nor does it handle arithmetic.
>  \item	The tools are restricted to small loop-free code fragments
>  	running on small numbers of threads. Larger examples result
>  	in state-space explosion, just as with similar tools such as
> diff --git a/future/formalregress.tex b/future/formalregress.tex
> index 9bfcdd2d..1677c722 100644
> --- a/future/formalregress.tex
> +++ b/future/formalregress.tex
> @@ -125,7 +125,7 @@ good match for modern computer systems, as was seen in
>  \cref{chp:Advanced Synchronization: Memory Ordering}.
>  In contrast, one of the great strengths of PPCMEM and \co{herd}
>  is their detailed modeling of various CPU families memory models,
> -including x86, ARM, Power, and, in the case of \co{herd},
> +including x86, \ARM, Power, and, in the case of \co{herd},
>  even a Linux-kernel memory model~\cite{Alglave:2018:FSC:3173162.3177156},
>  which has been accepted into version 4.17 of
>  the Linux kernel.
> diff --git a/legal.tex b/legal.tex
> index 21b9263f..d4648c7f 100644
> --- a/legal.tex
> +++ b/legal.tex
> @@ -13,8 +13,16 @@ Trademarks:
>  	of International Business Machines Corporation in the United
>  	States, other countries, or both.
>  \item	Linux is a registered trademark of Linus Torvalds.
> -\item	i386 is a trademark of Intel Corporation or its subsidiaries
> -	in the United States, other countries, or both.
> +\item	Intel, Itanium, Intel Core, and Intel Xeon are trademarks
> +	of Intel Corporation or its subsidiaries in the United States,
> +	other countries, or both.
> +\item	Arm is a registered trademark of Arm Limited (or its subsidiaries)
> +	in the US and/or elsewhere.
> +\item	MIPS is a registered trademark of Wave, Inc. in the United States
> +	and other countries.
> +\item	SPARC is a registered trademark of SPARC International, Inc.
> +	Products bearing SPARC trademarks are based on an architecture
> +	developed by Sun Microsystems, Inc.
>  \item	Other company, product, and service names may be trademarks or
>  	service marks of such companies.
>  \end{itemize}
> diff --git a/memorder/memorder.tex b/memorder/memorder.tex
> index 01355730..63f1474b 100644
> --- a/memorder/memorder.tex
> +++ b/memorder/memorder.tex
> @@ -330,7 +330,7 @@ synchronization primitives (such as locking and RCU)
>  that are responsible for maintaining the illusion of ordering through use of
>  \emph{memory barriers} (for example, \co{smp_mb()} in the Linux kernel).
>  These memory barriers can be explicit instructions, as they are on
> -ARM, \Power{}, Itanium, and Alpha, or they can be implied by other instructions,
> +\ARM, \Power{}, Itanium, and Alpha, or they can be implied by other instructions,
>  as they often are on x86.
>  Since these standard synchronization primitives preserve the illusion of
>  ordering, your path of least resistance is to simply use these primitives,
> @@ -1332,7 +1332,7 @@ in pre-v4.15 Linux kernels.
>  	To sum up, current platforms either respect address dependencies
>  	implicitly, as is the case for TSO platforms (x86, mainframe,
>  	SPARC,~...), have hardware tracking for address dependencies
> -	(ARM, PowerPC, MIPS,~...), have the required memory barriers
> +	(\ARM, PowerPC, MIPS,~...), have the required memory barriers
>  	supplied by \co{READ_ONCE()} (DEC Alpha in Linux kernel v4.15 and
>  	later), or require the memory barriers supplied by
>  	\co{rcu_dereference()} (DEC Alpha in Linux kernel v4.14 and earlier).
> @@ -1582,7 +1582,7 @@ instead provided the slightly weaker
>  \emph{other-multicopy atomicity}~\cite[Section B2.3]{ARMv8A:2017},
>  which excludes the CPU doing a given store from the requirement that all
>  CPUs agree on the order of all stores.\footnote{
> -	As of late 2018, ARMv8 and x86 provide other-multicopy atomicity,
> +	As of late 2018, \ARMv8 and x86 provide other-multicopy atomicity,
>  	IBM mainframe provides fully multicopy atomicity, and PPC does
>  	not provide multicopy atomicity at all. More detail is shown in
>  	\cref{tab:memorder:Summary of Memory Ordering}.}
> @@ -2071,7 +2071,7 @@ This should not come as a surprise to anyone who carefully examined
>  	\cref{lst:memorder:2+2W Litmus Test With Write Barriers}
>  	(\path{C-2+2W+o-wmb-o+o-wmb-o.litmus}),
>  	research shows that the cycle is prohibited, even in weakly
> -	ordered systems such as ARM and Power~\cite{test6-pdf}.
> +	ordered systems such as \ARM\ and Power~\cite{test6-pdf}.
>  	Given that, are store-to-store really \emph{always}
>  	counter-temporal???
>  \QuickQuizAnswer{
> @@ -3720,8 +3720,8 @@ to what can be done based on individual memory accesses.
>  	\cmidrule{3-11}
>  	\multicolumn{2}{c}{\raisebox{.5ex}{Property}}
>  	& \cpufml{Alpha}
> -	& \cpufml{ARMv7-A/R}
> -	& \cpufml{ARMv8}
> +	& \cpufml{\ARMv7-A/R}
> +	& \cpufml{\ARMv8}
>  	& \cpufml{Itanium}
>  	& \cpufml{MIPS}
>  	& \cpufml{\Power{}}
> @@ -4041,7 +4041,7 @@ in Alpha's heyday.
>  One could place an \co{smp_rmb()} primitive
>  between the pointer fetch and dereference in order to force Alpha
>  to order the pointer fetch with the later dependent load.
> -However, this imposes unneeded overhead on systems (such as ARM,
> +However, this imposes unneeded overhead on systems (such as \ARM,
>  Itanium, PPC, and SPARC) that respect data dependencies on the read side.
>  A \co{smp_read_barrier_depends()} primitive has therefore been added to the
>  Linux kernel to eliminate overhead on these systems, and was also added
> @@ -4150,13 +4150,13 @@ an \co{smp_mb()} rather than a no-op.
>  
>  For more on Alpha, see its reference manual~\cite{ALPHA2002}.
>  
> -\subsection{ARMv7-A/R}
> +\subsection{\ARMv7-A/R}
>  \label{sec:memorder:ARMv7-A/R}
>  
> -The ARM family of CPUs is extremely popular in embedded applications,
> +The \ARM\ family of CPUs is extremely popular in embedded applications,
>  particularly for power-constrained applications such as cellphones.
>  Its memory model is similar to that of \Power{}
> -(see \cref{sec:memorder:POWER / PowerPC}), but ARM uses a
> +(see \cref{sec:memorder:POWER / PowerPC}), but \ARM\ uses a
>  different set of memory-barrier instructions~\cite{ARMv7A:2010}:
>  
>  \begin{description}
> @@ -4166,7 +4166,7 @@ different set of memory-barrier instructions~\cite{ARMv7A:2010}:
>  	The ``type'' of operations can be all operations or can be
>  	restricted to only writes (similar to the Alpha \co{wmb}
>  	and the \Power{} \co{eieio} instructions).
> -	In addition, ARM allows cache coherence to have one of three
> +	In addition, \ARM\ allows cache coherence to have one of three
>  	scopes: single processor, a subset of the processors
>  	(``inner'') and global (``outer'').
>  \item	[\tco{DSB}] (data synchronization barrier) causes the specified
> @@ -4175,7 +4175,7 @@ different set of memory-barrier instructions~\cite{ARMv7A:2010}:
>  	The ``type'' of operations is the same as that of \co{DMB}.
>  	The \co{DSB} instruction was called \co{DWB} (drain write buffer
>  	or data write barrier, your choice) in early versions of the
> -	ARM architecture.
> +	\ARM\ architecture.
>  \item	[\tco{ISB}] (instruction synchronization barrier) flushes the CPU
>  	pipeline, so that all instructions following the \co{ISB}
>  	are fetched only after the \co{ISB} completes.
> @@ -4194,7 +4194,7 @@ stronger than
>  \cref{sec:memorder:Cumulativity}'s
>  variant of cumulativity.
>  
> -ARM also implements control dependencies, so that if a conditional
> +\ARM\ also implements control dependencies, so that if a conditional
>  branch depends on a load, then any store executed after that conditional
>  branch will be ordered after the load.
>  However, loads following the conditional branch will \emph{not}
> @@ -4218,7 +4218,7 @@ r3 = z;			\lnlbl[z2]
>  In this example, load-store control dependency ordering causes
>  the load from \co{x} on \clnref{x} to be ordered before the store to
>  \co{y} on \clnref{y}.
> -However, ARM does not respect load-load control dependencies, so that
> +However, \ARM\ does not respect load-load control dependencies, so that
>  the load on \clnref{x} might well happen \emph{after} the
>  load on \clnref{z1}.
>  On the other hand, the combination of the conditional branch on \clnref{if}
> @@ -4228,7 +4228,7 @@ Note that inserting an additional \co{ISB} instruction somewhere between
>  \clnref{nop,y} would enforce ordering between \clnref{x,z1}.
>  \end{fcvref}
>  
> -\subsection{ARMv8}
> +\subsection{\ARMv8}
>  
>  \begin{figure}[tb]
>  \centering
> @@ -4237,29 +4237,29 @@ Note that inserting an additional \co{ISB} instruction somewhere between
>  \ContributedBy{Figure}{fig:memorder:Half Memory Barrier}{Melissa Brossard}
>  \end{figure}
>  
> -ARMv8 is ARM's new CPU family~\cite{ARMv8A:2017}
> +\ARMv8 is \ARM's new CPU family~\cite{ARMv8A:2017}
>  which includes 64-bit capabilities,
>  in contrast to their 32-bit-only CPU described in
>  \cref{sec:memorder:ARMv7-A/R}.
> -ARMv8's memory model closely resembles its ARMv7 counterpart,
> +\ARMv8's memory model closely resembles its \ARMv7 counterpart,
>  but adds load-acquire (\co{LDLARB}, \co{LDLARH}, and \co{LDLAR})
>  and store-release (\co{STLLRB}, \co{STLLRH}, and \co{STLLR})
>  instructions.
>  These instructions act as ``half memory barriers'', so that
> -ARMv8 CPUs can reorder previous accesses with a later \co{LDLAR}
> +\ARMv8 CPUs can reorder previous accesses with a later \co{LDLAR}
>  instruction, but are prohibited from reordering an earlier \co{LDLAR}
>  instruction with later accesses, as fancifully depicted in
>  \cref{fig:memorder:Half Memory Barrier}.
> -Similarly, ARMv8 CPUs can reorder an earlier \co{STLLR} instruction with
> +Similarly, \ARMv8 CPUs can reorder an earlier \co{STLLR} instruction with
>  a subsequent access, but are prohibited from reordering
>  previous accesses with a later \co{STLLR} instruction.
>  As one might expect, this means that these instructions directly support
>  the C11 notion of load-acquire and store-release.
>  
> -However, ARMv8 goes well beyond the C11 memory model by mandating that
> +However, \ARMv8 goes well beyond the C11 memory model by mandating that
>  the combination of a store-release and load-acquire act as a full
>  barrier under many circumstances.
> -For example, in ARMv8, given a store followed by a store-release followed
> +For example, in \ARMv8, given a store followed by a store-release followed
>  a load-acquire followed by a load, all to different variables and all from
>  a single CPU, all CPUs
>  would agree that the initial store preceded the final load.
> @@ -4267,12 +4267,12 @@ Interestingly enough, most TSO architectures (including x86 and the
>  mainframe) do not make this guarantee, as the two loads could be
>  reordered before the two stores.
>  
> -ARMv8 is one of only two architectures that needs the
> +\ARMv8 is one of only two architectures that needs the
>  \co{smp_mb__after_spinlock()} primitive to be a full barrier,
>  due to its relatively weak lock-acquisition implementation in
>  the Linux kernel.
>  
> -ARMv8 also has the distinction of being the first CPU whose vendor publicly
> +\ARMv8 also has the distinction of being the first CPU whose vendor publicly
>  defined its memory ordering with an executable formal model~\cite{ARMv8A:2017}.
>  
>  \subsection{Itanium}
> @@ -4287,7 +4287,7 @@ instructions~\cite{IntelItanium02v3}.
>  The {\tt acq} modifier prevents subsequent memory-reference instructions
>  from being reordered before the {\tt acq}, but permits
>  prior memory-reference instructions to be reordered after the {\tt acq},
> -similar to the ARMv8 load-acquire instructions.
> +similar to the \ARMv8 load-acquire instructions.
>  Similarly, the {\tt rel} modifier prevents prior memory-reference
>  instructions from being reordered after the {\tt rel}, but allows
>  subsequent memory-reference instructions to be reordered before
> @@ -4329,12 +4329,12 @@ CPU, including those to the same variable.
>  \subsection{MIPS}
>  
>  The MIPS memory model~\cite[page~479]{MIPSvII-A-2017}
> -appears to resemble that of ARM, Itanium, and \Power{},
> +appears to resemble that of \ARM, Itanium, and \Power{},
>  being weakly ordered by default, but respecting dependencies.
>  MIPS has a wide variety of memory-barrier instructions, but ties them
>  not to hardware considerations, but rather to the use cases provided
>  by the Linux kernel and the C++11 standard~\cite{RichardSmith2015N4527}
> -in a manner similar to the ARMv8 additions:
> +in a manner similar to the \ARMv8 additions:
>  
>  \begin{description}[style=nextline]
>  \item[\tco{SYNC}]
> @@ -4377,7 +4377,7 @@ in a manner similar to the ARMv8 additions:
>  
>  Informal discussions with MIPS architects indicates that MIPS has a
>  definition of transitivity or cumulativity similar to that of
> -ARM and \Power{}.
> +\ARM\ and \Power{}.
>  However, it appears that different MIPS implementations can have
>  different memory-ordering properties, so it is important to consult
>  the documentation for the specific MIPS implementation you are using.
> @@ -4385,8 +4385,7 @@ the documentation for the specific MIPS implementation you are using.
>  \subsection{\Power{} / PowerPC}
>  \label{sec:memorder:POWER / PowerPC}
>  
> -The \Power{} and PowerPC\mytextregistered\
> -CPU families have a wide variety of memory-barrier
> +The \Power{} and PowerPC CPU families have a wide variety of memory-barrier
>  instructions~\cite{PowerPC94,MichaelLyons05a}:
>  \begin{description}
>  \item	[\tco{sync}] causes all preceding operations to {\em appear to have}
> @@ -4448,11 +4447,11 @@ fragment itself saw.
>  Much more detail is available from
>  McKenney and Silvera~\cite{PaulEMcKenneyN2745r2009}.
>  
> -\Power{} respects control dependencies in much the same way that ARM
> +\Power{} respects control dependencies in much the same way that \ARM\
>  does, with the exception that the \Power{} \co{isync} instruction
> -is substituted for the ARM \co{ISB} instruction.
> +is substituted for the \ARM\ \co{ISB} instruction.
>  
> -Like ARMv8, \Power{} requires \co{smp_mb__after_spinlock()} to be
> +Like \ARMv8, \Power{} requires \co{smp_mb__after_spinlock()} to be
>  a full memory barrier.
>  In addition, \Power{} is the only architecture requiring
>  \co{smp_mb__after_unlock_lock()} to be a full memory barrier.
> @@ -4595,8 +4594,7 @@ it~\cite[Section 8.1.3]{Intel64IA32v3A2011}.
>  
>  \subsection{z Systems}
>  
> -The z~Systems machines make up the IBM\mytexttrademark\
> -mainframe family, previously
> +The z~Systems machines make up the IBM mainframe family, previously
>  known as the 360, 370, 390 and zSeries~\cite{IBMzSeries04a}.
>  Parallelism came late to z~Systems, but given that these mainframes first
>  shipped in the mid 1960s, this is not saying much.
> diff --git a/perfbook.tex b/perfbook.tex
> index a5ac180b..f5b57058 100644
> --- a/perfbook.tex
> +++ b/perfbook.tex
> @@ -269,16 +269,14 @@
>  \DeclareRobustCommand{\euler}{\ensuremath{\mathrm{e}}}
>  \DeclareRobustCommand{\O}[1]{\ensuremath{\mathcal{O}\left(#1\right)}}
>  \newcommand{\Power}[1]{POWER#1}
> +\newcommand{\ARM}[1]{Arm{#1}}
> +\newcommand{\ARMv}[1]{Armv{#1}}
>  \newcommand{\GNUC}{GNU~C}
>  \newcommand{\GCC}{GCC}
>  %\newcommand{\GCC}{\co{gcc}} % For those who prefer "gcc"
>  \newcommand{\IRQ}{IRQ}
>  %\newcommand{\IRQ}{irq}      % For those who prefer "irq"
>  \newcommand{\rt}{\mbox{-rt}} % to prevent line break behind "-"
> -\newcommand{\mytexttrademark}{}
> -\newcommand{\mytextregistered}{}
> -%\newcommand{\mytexttrademark}{\textsuperscript\texttrademark}
> -%\newcommand{\mytextregistered}{\textsuperscript\textregistered}
>  
>  \newcommand{\Epigraph}[2]{\epigraphhead[65]{\epigraph{#1}{#2}}}
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-03-15  1:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-14 15:14 [PATCH 1/4] Use 'Arm' as text trademark of Arm architecture Akira Yokosawa
2020-03-14 15:15 ` [PATCH 2/4] memorder: Use \dots for ellipsis Akira Yokosawa
2020-03-14 15:16 ` [PATCH 3/4] Remove '(R)' and '(TM)' Akira Yokosawa
2020-03-14 15:17 ` [PATCH 4/4] formal/axiomatic: Move table env next to QQA's first paragraph Akira Yokosawa
2020-03-14 23:07 ` [PATCH 1/4] Use 'Arm' as text trademark of Arm architecture Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.